cbmm: the science and engineering of intelligence...cbmm: the science and engineering of...
TRANSCRIPT
CBMM the Science and Engineering of IntelligenceThe Center for Brains Minds and Machines (CBMM) is a multi-
institutional NSF Science and Technology Center dedicated to the study of intelligence - how the brain produces intelligent behavior
and how we may be able to replicate intelligence in machines
Publications ~500
Research Institutions ~4
Faculty (CS+BCS+hellip) ~23
Researchers ~100
Educational Institutions 12
Funding 2013-2023 ~$50MMachine LearningComputer Science
Science + Engineering
Cognitive Science
NeuroscienceComputational
We aim to make progress in understanding the greatest of all problems in science mdash the problem of intelligence This means
understanding how the brain makes the mind how the brain works and how to build intelligent machines We believe that the science of intelligence will enable better engineering of
intelligence in the long term
CBMMrsquos focus is the Science and the Engineering of Intelligence
Key recent advances in the engineering of intelligence have their roots in basic research on the brain
The CBMM bet (different from Deep Mind)
understand how the brain works (then) make intelligent machines
The problem of intelligence is the greatest problem in science
EAC- May 2020
CBMM Organizational Chart (future)
DirectorTomaso Poggio
EAC
Managing DirectorKathleen Sullivan
(MIT)
Education CoordinatorEllen Hildreth
(WC)
Education Evaluation
Lizanne DeStefano
(GT)
KT Coordinator
Boris Katz(MIT)
Diversity Coordinator
Mandana Sassanfar
(MIT)
Deputy Director
Gabriel Kreiman
(HU)
Associate Director amp Trainee
CoordinatorMatt Wilson
(MIT)
Research DirectorKenneth
Blum(HU)
AdministrativeAssistant
Technology Director
Module 1VISUAL
STREAMTomaso PoggioShimon Ullman
(MIT)
Module 2BRAIN OS
Gabriel Kreiman(HU)
Module 4TOWARDS SYMBOLSBoris Katz
Shimon Ullman(MIT)
Module 3COGNITIVE
CORENancy Kanwisher
Joshua Tenenbaum(MIT)
Jim DiCarlo
102030405060708090
100110120130140150
Facu
lty
Resea
rch Scie
ntist
Postdoc
s
Grad Stud
ents
Underg
rads
StaffO
ther
Total
Year 1Year 2Year 3Year 4Year 5Year 6(Year 7)
CBMM Participants
EAC
Demis Hassabis DeepMind
Charles Isbell Jr Georgia Tech
Christof Koch Allen Institute
Fei-Fei Li Stanford
Lore McGovern MIBR MIT
Joel Oppenheim NYU
Pietro Perona Caltech
Marc Raibert Boston DynamicsJudith Richter MedinolKobi Richter Medinol
Amnon Shashua Mobileye
David Siegel Two Sigma
Susan Whitehead MIT Corporation
Jim Pallotta The Raptor group
Research Education amp Diversity Partners
Boyden Desimone DiCarlo Kaelbling Kanwisher Katz McDermott Oliva Poggio Roy Sassanfar Saxe Schulz Tegmark Tenenbaum Ullman Wilson Torralba
Blum Gershman Kreiman Livingstone Sompolinsky Spelke
MIT Harvard
Chouika Manaye Rwebangira Salmani
Howard U
Hunter College
Isik
Johns Hopkins U
BrumbergQueens College
Chodorow Epstein Sakas Zeigler Freiwald
Rockefeller U
Stanford UJorquera
Universidad Central Del Caribe (UCC)
McNair Program
University of Central Florida
Goodman
Blaser Ciaramitaro Pomplun Shukla
UMass Boston UPR - Mayaguumlez UPR ndash Riacuteo Piedras
Hildreth Wiest WilmerWellesley College
Santiago Vega-Riveros Garcia-Arraras Maldonado-Vlaar Megret Ordoacutentildeez Ortiz-Zuazaga
Kreiman Livingstone
Harvard Medical School
FinlaysonFlorida International U
Kreiman
Boston Childrenrsquos Hospital
Museum of Science Boston
DeepMind
International and Corporate Partners
IITCingolani
ASTARChuan Poh Lim
Hebrew UWeiss
MPIBuumllthoff
Genoa UVerri Rosasco
WeizmannUllman
Sangwan Lee
IBM HondaMicrosoft
Boston Dynamics
Orcam NVIDIASiemens
Schlumberger Mobileye Intel
Fujitsu
GE
Kaist
Videos - ~950 (May 2014 - April 2020)
(of Youtube subscribers only - 18 of viewers)
Ellen Hildreth
Mandana Sassanfar
Diversity Program
EAC- May 2020
Code Software and Datasets
Therersquos Waldo A Normalization Model of Visual Search Predicts Single-Trial Human Fixations in an Object Search Task Thomas Miconi Laura Groomes and Gabriel
Kreiman
Cerebral Cortex 2016
- See more at httpklabtchharvardeduresources
miconietal_visualsearch_2016htmlsthashKmHoBP
skxwHtrTkJdpuf
ObjectNet A new benchmark for object recognition (in prep) Andrei Barbu David Mayo Josh Tenenbaum Boris Katz
Existing object detection benchmarks overstate the performance of machines and understate the performance of humans We are creating a dataset that removes biases and shows that machines are far inferior to humans when detecting objects
Partially Occluded Hands B Myanganbayar C Mata G Dekel B Katz G Ben-Yosef A Barbu
A dataset of RGB images of hands holding objects and interacting with objects Measured human accuracy on reconstructing occluded portions of hands People are extremely good at this task while networks are at near chance-level performance
Summer Course at Woods Hole Our flagship initiative
Brains Minds amp Machines Summer Course An intensive three-week course gives advanced students a ldquodeeprdquo introduction to the problem of intelligence
A self-reproducing community of scholars is being formed ~gt300 applicants ~30 accepted
Sponsored fellowships by GoogleX Hidary Foundation + Fujitsu
Ellen Hildreth
Boris Katz
Gabriel Kreiman
Directors
Lizanne Distefano
Kenny Blum
Kathleen Sullivan
Kris Brewer
EAC May 2020
CBMM Summer Schoolbull Signature CBMM (EducationKnowledge Transfer)activity aimed at creating an intergenerationalcommunity around the scienceandengineeringofintelligence
bull Students reported strong influence of lecturesworkingonprojectsandinteractionsamongfacultyTArsquosandpeersontheirownthinkingandresearchdevelopment
understand how the brain works (then) make intelligent machines
WHY
Our vision and mission
Recent Success Stories in AI are based on RL and DL
DL and RL come from neuroscience
Minskyrsquos SNARC
RL
DL
We focus on the combination of neuroscience and engineering to make progress on the problem of intelligence because as in the recent past it is likely that several of the next breakthroughs in ML and AI are likely to come from neuroscienceANDengineering
Vision for the BMM SummerSchool
A quick recap of 40 of the last ~50 years of neuroscience and ML through my eyes
1972-2013
Tuebingen MPI fuer BK (1972-1981)
Werner Reichardtrsquos PhD
Werner with Dr Ruska (center) Photo dated Nov 17 1952 (courtesy B Reichardt)
The four directors of the MPI fuer Biologische Kybernetik
23
The beautiful eyes of flies
Work at 3 levels
bull Fixation and tracking behavior of the fly
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits
bull Biophysics of computation
Fixation and tracking behavior Reichardtrsquos closed loop flight simulator
26
Fixation and tracking behavior
Poggio T and W Reichardt A Theory of Pattern Induced Flight Orientation of the Fly Musca Domestica Kybernetik 12 185-203 1972
27
Cognition in flies probabilistic theories then (coming only now to humans)
The beginning of untethered flight analysis Buumllthoff Poggio amp Wehrhahn Z Naturforsch 35c 811-815 (1980)
most behavioral fly research was done with the Goumltz torque meter
in 1976 based on this recording technology Reichardt amp Poggio developed their theory for Visual control of orientation behaviour in the fly Part I +II Quart Rev Biophysics 9(3) 311-375
open question how well does this theory describe fly behavior of natural flight
in 1980 Wehrhan started high-speed film recording of flies chasing each other
single frame analysis 3D stereo reconstruction
Cognitive theory of basic fly instincts predicts trajectory of chasing fly hellip
Wehrhahn C T Poggio and H Buumllthoff Biological Cybernetics 45 123-130 1982
30
Cognition in flies
Geiger G and T Poggio The Muller-Lyer Figure and the Fly Science 190 479-480 1975
Work at 3 levels
bull Fixation and tracking behavior of the fly (cognition in the flyhellipsimilar to Bayesian approach to cognition in humanshellipno neurons)
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits
bull Biophysics of computation
Motion algorithm the beetle Clorophanus and Reichardtrsquos motion detector
Motion algorithm the beetle and the fly
bull The beetle follows the motion
bull Each photoreceptor sees only an alternation of dark and light how is motion computed
bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector
bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz
bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex
bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)
Relative motion and figure-ground discrimination the fly
Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)
Motion discontinuities and figure-ground discrimination neural circuitry
Towards the neural circuitry Reichardt Poggio Hausen 1983
36
Relative motion
Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003
Two of the neuronshellip
Work at 3 levels
bull Fixation and tracking behavior of the fly
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)
bull Biophysics of computation
39
Biophysics of computation (motion detection)
Biophysics of Computation
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
copy Nature Publishing Group1985
_____________________________________ ____________
Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch
Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA
Istituto di Fisica Universita di Genova Genova Italy
Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain
COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision
Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction
A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems
Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only
Examples of early vision processes
bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour
generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing
Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process
A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems
Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6
one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions
The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the
Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ
ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ
oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ
cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ
$3)135 51052+5 4amp-5 5 (5
13
UacuteѱKŏUdԛ ԛ
ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ
CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ
oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
Cooperativeneuralnetworkforstereo
~ 1979 T Poggio and D Marr MPI Tuebingen
)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz
1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg
nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2
sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2
0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2
3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2
6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2
sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2
=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2
U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2
$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2
$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2
$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2
$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2
$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2
$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2
$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2
$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2
$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2
sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2
$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2
sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2
sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2
13
13
4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z
PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2
13
+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz
D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz
0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2
3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2
6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2
sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2
)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2
sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2
sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2
$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2
sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2
sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2
0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2
3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2
6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2
sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2
=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2
Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2
pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz
UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z
Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2
=sup2
Cooperative Computation of Stereo Disparity
D Marr T Poggio
Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287
Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1
Science is currently published by American Association for the Advancement of Science
Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use
Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission
JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg
httpwwwjstororgMon Jan 22 124953 2007
Vision A Computational Investigation into the Human Representation and Processing of Visual Information
Foreword by Afterword by Tomaso Poggio
David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists
In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level
Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui
Visionwhatiswhere
A complex system must be understood at several different levels
Werner Reichardtrsquos scientific legacy Integrative Neuroscience
bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels
mdash computation mdash algorithms mdash biophysics and circuits
bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip
bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip
bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik
MIT (1981-)
43rd Stated Meeting of the NRP Associates March 14-17 1982
Learning theory + algorithms
Computational Neuroscience
models+experiments
ENGINEERING APPLICATIONS
bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor
How visual cortex works ndash and how it may suggest better computer vision
systems
2
1
1min ( ( ))i i Kf H i
V y f x fmicroisin
=
⎡ ⎤+⎢ ⎥
⎣ ⎦sum
Predictive regularization algorithms
Theorems on foundations of learning
MIT (1981-)
BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001
ON THE MATHEMATICAL FOUNDATIONS OF LEARNING
FELIPE CUCKER AND STEVE SMALE
The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial
T Poggio and CR Shelton
Introduction
(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear
We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of
languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-
ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])
(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice
Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)
Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In
Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University
grant No 8780043
c2001 American Mathematical Society
1
General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2
1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA
Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering
One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label
In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data
In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses
What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-
ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the
algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate
Box 1Formal definitions in supervised learning
Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example
n1lim jXn 2Xj 0 in probability) if and only if for every e 0
n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z
S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as
I$f
zVf zdmz
which is also the expected error of a new sample z drawn from thedistribution In the case of square loss
I$f
XYfx2 y2dmxy
We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S
IS$f 1
n
X
n
i1
Vf zi
Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m
n1lim jI$fS2 IS$fSj 0 in probability
An algorithm is (universally) consistent if uniformly for any distributionm and any e 0
n1lim P I$fSf2Hinf I$famp 1
0
letters to nature
NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group
Why do hierarchical architectures work
bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern
Sung amp Poggio 1995
~15 year old CBCL computer vision research face detection
since 2006 on the market (digital cameras)
Third Annual NSF Site Visit June 8 ndash 9 2016
Moore-like law for ML (1995-2018)
bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses
Visionwhatiswhere
bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)
ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex
Van Essen amp Anderson 1990
The ventral stream hierarchy V1 V2 V4 IT
A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to
position and scale changes
Kobatake amp Tanaka 1994
Visionventralstream
74
Cognition in people
Shape representation in the inferior temporalcortex of monkeys
Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during
view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view
Current Biology 1995 5552-563
Background
Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object
Most theories which postulate that transformations of animage representation precede matching assume either a
complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set
Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer
Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu
Current Biology 1995 Vol 5 No 5552
9520 spring 2003
Modelrsquos early predictions neurons become view-tuned during recognition
Poggio Edelman Riesenhuber (1990 2000)
Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995
A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)
Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007
Database collected by Oliva amp Torralba
Psychophysics of rapid categorization
Rapid categorization task (with mask to test feedforward model)
Animal present or not
30 ms ISI
20 ms
Image
Interval Image-Mask
Mask 1f noise
80 ms
Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005
Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)
Hierarchical feedforward models of the ventral stream
Decoding the neural code Matrix-like read-out from the brain
Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale
Hung Kreiman Poggio DiCarlo 2005
Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005
helliphelliphellip in 2013helliphellip
We aim to make progress in understanding the greatest of all problems in science mdash the problem of intelligence This means
understanding how the brain makes the mind how the brain works and how to build intelligent machines We believe that the science of intelligence will enable better engineering of
intelligence in the long term
CBMMrsquos focus is the Science and the Engineering of Intelligence
Key recent advances in the engineering of intelligence have their roots in basic research on the brain
The CBMM bet (different from Deep Mind)
understand how the brain works (then) make intelligent machines
The problem of intelligence is the greatest problem in science
EAC- May 2020
CBMM Organizational Chart (future)
DirectorTomaso Poggio
EAC
Managing DirectorKathleen Sullivan
(MIT)
Education CoordinatorEllen Hildreth
(WC)
Education Evaluation
Lizanne DeStefano
(GT)
KT Coordinator
Boris Katz(MIT)
Diversity Coordinator
Mandana Sassanfar
(MIT)
Deputy Director
Gabriel Kreiman
(HU)
Associate Director amp Trainee
CoordinatorMatt Wilson
(MIT)
Research DirectorKenneth
Blum(HU)
AdministrativeAssistant
Technology Director
Module 1VISUAL
STREAMTomaso PoggioShimon Ullman
(MIT)
Module 2BRAIN OS
Gabriel Kreiman(HU)
Module 4TOWARDS SYMBOLSBoris Katz
Shimon Ullman(MIT)
Module 3COGNITIVE
CORENancy Kanwisher
Joshua Tenenbaum(MIT)
Jim DiCarlo
102030405060708090
100110120130140150
Facu
lty
Resea
rch Scie
ntist
Postdoc
s
Grad Stud
ents
Underg
rads
StaffO
ther
Total
Year 1Year 2Year 3Year 4Year 5Year 6(Year 7)
CBMM Participants
EAC
Demis Hassabis DeepMind
Charles Isbell Jr Georgia Tech
Christof Koch Allen Institute
Fei-Fei Li Stanford
Lore McGovern MIBR MIT
Joel Oppenheim NYU
Pietro Perona Caltech
Marc Raibert Boston DynamicsJudith Richter MedinolKobi Richter Medinol
Amnon Shashua Mobileye
David Siegel Two Sigma
Susan Whitehead MIT Corporation
Jim Pallotta The Raptor group
Research Education amp Diversity Partners
Boyden Desimone DiCarlo Kaelbling Kanwisher Katz McDermott Oliva Poggio Roy Sassanfar Saxe Schulz Tegmark Tenenbaum Ullman Wilson Torralba
Blum Gershman Kreiman Livingstone Sompolinsky Spelke
MIT Harvard
Chouika Manaye Rwebangira Salmani
Howard U
Hunter College
Isik
Johns Hopkins U
BrumbergQueens College
Chodorow Epstein Sakas Zeigler Freiwald
Rockefeller U
Stanford UJorquera
Universidad Central Del Caribe (UCC)
McNair Program
University of Central Florida
Goodman
Blaser Ciaramitaro Pomplun Shukla
UMass Boston UPR - Mayaguumlez UPR ndash Riacuteo Piedras
Hildreth Wiest WilmerWellesley College
Santiago Vega-Riveros Garcia-Arraras Maldonado-Vlaar Megret Ordoacutentildeez Ortiz-Zuazaga
Kreiman Livingstone
Harvard Medical School
FinlaysonFlorida International U
Kreiman
Boston Childrenrsquos Hospital
Museum of Science Boston
DeepMind
International and Corporate Partners
IITCingolani
ASTARChuan Poh Lim
Hebrew UWeiss
MPIBuumllthoff
Genoa UVerri Rosasco
WeizmannUllman
Sangwan Lee
IBM HondaMicrosoft
Boston Dynamics
Orcam NVIDIASiemens
Schlumberger Mobileye Intel
Fujitsu
GE
Kaist
Videos - ~950 (May 2014 - April 2020)
(of Youtube subscribers only - 18 of viewers)
Ellen Hildreth
Mandana Sassanfar
Diversity Program
EAC- May 2020
Code Software and Datasets
Therersquos Waldo A Normalization Model of Visual Search Predicts Single-Trial Human Fixations in an Object Search Task Thomas Miconi Laura Groomes and Gabriel
Kreiman
Cerebral Cortex 2016
- See more at httpklabtchharvardeduresources
miconietal_visualsearch_2016htmlsthashKmHoBP
skxwHtrTkJdpuf
ObjectNet A new benchmark for object recognition (in prep) Andrei Barbu David Mayo Josh Tenenbaum Boris Katz
Existing object detection benchmarks overstate the performance of machines and understate the performance of humans We are creating a dataset that removes biases and shows that machines are far inferior to humans when detecting objects
Partially Occluded Hands B Myanganbayar C Mata G Dekel B Katz G Ben-Yosef A Barbu
A dataset of RGB images of hands holding objects and interacting with objects Measured human accuracy on reconstructing occluded portions of hands People are extremely good at this task while networks are at near chance-level performance
Summer Course at Woods Hole Our flagship initiative
Brains Minds amp Machines Summer Course An intensive three-week course gives advanced students a ldquodeeprdquo introduction to the problem of intelligence
A self-reproducing community of scholars is being formed ~gt300 applicants ~30 accepted
Sponsored fellowships by GoogleX Hidary Foundation + Fujitsu
Ellen Hildreth
Boris Katz
Gabriel Kreiman
Directors
Lizanne Distefano
Kenny Blum
Kathleen Sullivan
Kris Brewer
EAC May 2020
CBMM Summer Schoolbull Signature CBMM (EducationKnowledge Transfer)activity aimed at creating an intergenerationalcommunity around the scienceandengineeringofintelligence
bull Students reported strong influence of lecturesworkingonprojectsandinteractionsamongfacultyTArsquosandpeersontheirownthinkingandresearchdevelopment
understand how the brain works (then) make intelligent machines
WHY
Our vision and mission
Recent Success Stories in AI are based on RL and DL
DL and RL come from neuroscience
Minskyrsquos SNARC
RL
DL
We focus on the combination of neuroscience and engineering to make progress on the problem of intelligence because as in the recent past it is likely that several of the next breakthroughs in ML and AI are likely to come from neuroscienceANDengineering
Vision for the BMM SummerSchool
A quick recap of 40 of the last ~50 years of neuroscience and ML through my eyes
1972-2013
Tuebingen MPI fuer BK (1972-1981)
Werner Reichardtrsquos PhD
Werner with Dr Ruska (center) Photo dated Nov 17 1952 (courtesy B Reichardt)
The four directors of the MPI fuer Biologische Kybernetik
23
The beautiful eyes of flies
Work at 3 levels
bull Fixation and tracking behavior of the fly
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits
bull Biophysics of computation
Fixation and tracking behavior Reichardtrsquos closed loop flight simulator
26
Fixation and tracking behavior
Poggio T and W Reichardt A Theory of Pattern Induced Flight Orientation of the Fly Musca Domestica Kybernetik 12 185-203 1972
27
Cognition in flies probabilistic theories then (coming only now to humans)
The beginning of untethered flight analysis Buumllthoff Poggio amp Wehrhahn Z Naturforsch 35c 811-815 (1980)
most behavioral fly research was done with the Goumltz torque meter
in 1976 based on this recording technology Reichardt amp Poggio developed their theory for Visual control of orientation behaviour in the fly Part I +II Quart Rev Biophysics 9(3) 311-375
open question how well does this theory describe fly behavior of natural flight
in 1980 Wehrhan started high-speed film recording of flies chasing each other
single frame analysis 3D stereo reconstruction
Cognitive theory of basic fly instincts predicts trajectory of chasing fly hellip
Wehrhahn C T Poggio and H Buumllthoff Biological Cybernetics 45 123-130 1982
30
Cognition in flies
Geiger G and T Poggio The Muller-Lyer Figure and the Fly Science 190 479-480 1975
Work at 3 levels
bull Fixation and tracking behavior of the fly (cognition in the flyhellipsimilar to Bayesian approach to cognition in humanshellipno neurons)
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits
bull Biophysics of computation
Motion algorithm the beetle Clorophanus and Reichardtrsquos motion detector
Motion algorithm the beetle and the fly
bull The beetle follows the motion
bull Each photoreceptor sees only an alternation of dark and light how is motion computed
bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector
bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz
bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex
bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)
Relative motion and figure-ground discrimination the fly
Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)
Motion discontinuities and figure-ground discrimination neural circuitry
Towards the neural circuitry Reichardt Poggio Hausen 1983
36
Relative motion
Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003
Two of the neuronshellip
Work at 3 levels
bull Fixation and tracking behavior of the fly
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)
bull Biophysics of computation
39
Biophysics of computation (motion detection)
Biophysics of Computation
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
copy Nature Publishing Group1985
_____________________________________ ____________
Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch
Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA
Istituto di Fisica Universita di Genova Genova Italy
Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain
COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision
Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction
A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems
Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only
Examples of early vision processes
bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour
generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing
Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process
A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems
Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6
one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions
The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the
Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ
ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ
oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ
cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ
$3)135 51052+5 4amp-5 5 (5
13
UacuteѱKŏUdԛ ԛ
ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ
CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ
oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
Cooperativeneuralnetworkforstereo
~ 1979 T Poggio and D Marr MPI Tuebingen
)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz
1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg
nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2
sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2
0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2
3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2
6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2
sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2
=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2
U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2
$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2
$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2
$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2
$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2
$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2
$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2
$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2
$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2
$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2
sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2
$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2
sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2
sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2
13
13
4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z
PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2
13
+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz
D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz
0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2
3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2
6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2
sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2
)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2
sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2
sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2
$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2
sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2
sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2
0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2
3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2
6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2
sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2
=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2
Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2
pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz
UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z
Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2
=sup2
Cooperative Computation of Stereo Disparity
D Marr T Poggio
Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287
Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1
Science is currently published by American Association for the Advancement of Science
Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use
Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission
JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg
httpwwwjstororgMon Jan 22 124953 2007
Vision A Computational Investigation into the Human Representation and Processing of Visual Information
Foreword by Afterword by Tomaso Poggio
David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists
In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level
Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui
Visionwhatiswhere
A complex system must be understood at several different levels
Werner Reichardtrsquos scientific legacy Integrative Neuroscience
bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels
mdash computation mdash algorithms mdash biophysics and circuits
bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip
bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip
bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik
MIT (1981-)
43rd Stated Meeting of the NRP Associates March 14-17 1982
Learning theory + algorithms
Computational Neuroscience
models+experiments
ENGINEERING APPLICATIONS
bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor
How visual cortex works ndash and how it may suggest better computer vision
systems
2
1
1min ( ( ))i i Kf H i
V y f x fmicroisin
=
⎡ ⎤+⎢ ⎥
⎣ ⎦sum
Predictive regularization algorithms
Theorems on foundations of learning
MIT (1981-)
BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001
ON THE MATHEMATICAL FOUNDATIONS OF LEARNING
FELIPE CUCKER AND STEVE SMALE
The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial
T Poggio and CR Shelton
Introduction
(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear
We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of
languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-
ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])
(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice
Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)
Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In
Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University
grant No 8780043
c2001 American Mathematical Society
1
General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2
1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA
Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering
One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label
In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data
In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses
What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-
ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the
algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate
Box 1Formal definitions in supervised learning
Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example
n1lim jXn 2Xj 0 in probability) if and only if for every e 0
n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z
S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as
I$f
zVf zdmz
which is also the expected error of a new sample z drawn from thedistribution In the case of square loss
I$f
XYfx2 y2dmxy
We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S
IS$f 1
n
X
n
i1
Vf zi
Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m
n1lim jI$fS2 IS$fSj 0 in probability
An algorithm is (universally) consistent if uniformly for any distributionm and any e 0
n1lim P I$fSf2Hinf I$famp 1
0
letters to nature
NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group
Why do hierarchical architectures work
bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern
Sung amp Poggio 1995
~15 year old CBCL computer vision research face detection
since 2006 on the market (digital cameras)
Third Annual NSF Site Visit June 8 ndash 9 2016
Moore-like law for ML (1995-2018)
bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses
Visionwhatiswhere
bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)
ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex
Van Essen amp Anderson 1990
The ventral stream hierarchy V1 V2 V4 IT
A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to
position and scale changes
Kobatake amp Tanaka 1994
Visionventralstream
74
Cognition in people
Shape representation in the inferior temporalcortex of monkeys
Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during
view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view
Current Biology 1995 5552-563
Background
Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object
Most theories which postulate that transformations of animage representation precede matching assume either a
complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set
Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer
Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu
Current Biology 1995 Vol 5 No 5552
9520 spring 2003
Modelrsquos early predictions neurons become view-tuned during recognition
Poggio Edelman Riesenhuber (1990 2000)
Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995
A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)
Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007
Database collected by Oliva amp Torralba
Psychophysics of rapid categorization
Rapid categorization task (with mask to test feedforward model)
Animal present or not
30 ms ISI
20 ms
Image
Interval Image-Mask
Mask 1f noise
80 ms
Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005
Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)
Hierarchical feedforward models of the ventral stream
Decoding the neural code Matrix-like read-out from the brain
Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale
Hung Kreiman Poggio DiCarlo 2005
Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005
helliphelliphellip in 2013helliphellip
The CBMM bet (different from Deep Mind)
understand how the brain works (then) make intelligent machines
The problem of intelligence is the greatest problem in science
EAC- May 2020
CBMM Organizational Chart (future)
DirectorTomaso Poggio
EAC
Managing DirectorKathleen Sullivan
(MIT)
Education CoordinatorEllen Hildreth
(WC)
Education Evaluation
Lizanne DeStefano
(GT)
KT Coordinator
Boris Katz(MIT)
Diversity Coordinator
Mandana Sassanfar
(MIT)
Deputy Director
Gabriel Kreiman
(HU)
Associate Director amp Trainee
CoordinatorMatt Wilson
(MIT)
Research DirectorKenneth
Blum(HU)
AdministrativeAssistant
Technology Director
Module 1VISUAL
STREAMTomaso PoggioShimon Ullman
(MIT)
Module 2BRAIN OS
Gabriel Kreiman(HU)
Module 4TOWARDS SYMBOLSBoris Katz
Shimon Ullman(MIT)
Module 3COGNITIVE
CORENancy Kanwisher
Joshua Tenenbaum(MIT)
Jim DiCarlo
102030405060708090
100110120130140150
Facu
lty
Resea
rch Scie
ntist
Postdoc
s
Grad Stud
ents
Underg
rads
StaffO
ther
Total
Year 1Year 2Year 3Year 4Year 5Year 6(Year 7)
CBMM Participants
EAC
Demis Hassabis DeepMind
Charles Isbell Jr Georgia Tech
Christof Koch Allen Institute
Fei-Fei Li Stanford
Lore McGovern MIBR MIT
Joel Oppenheim NYU
Pietro Perona Caltech
Marc Raibert Boston DynamicsJudith Richter MedinolKobi Richter Medinol
Amnon Shashua Mobileye
David Siegel Two Sigma
Susan Whitehead MIT Corporation
Jim Pallotta The Raptor group
Research Education amp Diversity Partners
Boyden Desimone DiCarlo Kaelbling Kanwisher Katz McDermott Oliva Poggio Roy Sassanfar Saxe Schulz Tegmark Tenenbaum Ullman Wilson Torralba
Blum Gershman Kreiman Livingstone Sompolinsky Spelke
MIT Harvard
Chouika Manaye Rwebangira Salmani
Howard U
Hunter College
Isik
Johns Hopkins U
BrumbergQueens College
Chodorow Epstein Sakas Zeigler Freiwald
Rockefeller U
Stanford UJorquera
Universidad Central Del Caribe (UCC)
McNair Program
University of Central Florida
Goodman
Blaser Ciaramitaro Pomplun Shukla
UMass Boston UPR - Mayaguumlez UPR ndash Riacuteo Piedras
Hildreth Wiest WilmerWellesley College
Santiago Vega-Riveros Garcia-Arraras Maldonado-Vlaar Megret Ordoacutentildeez Ortiz-Zuazaga
Kreiman Livingstone
Harvard Medical School
FinlaysonFlorida International U
Kreiman
Boston Childrenrsquos Hospital
Museum of Science Boston
DeepMind
International and Corporate Partners
IITCingolani
ASTARChuan Poh Lim
Hebrew UWeiss
MPIBuumllthoff
Genoa UVerri Rosasco
WeizmannUllman
Sangwan Lee
IBM HondaMicrosoft
Boston Dynamics
Orcam NVIDIASiemens
Schlumberger Mobileye Intel
Fujitsu
GE
Kaist
Videos - ~950 (May 2014 - April 2020)
(of Youtube subscribers only - 18 of viewers)
Ellen Hildreth
Mandana Sassanfar
Diversity Program
EAC- May 2020
Code Software and Datasets
Therersquos Waldo A Normalization Model of Visual Search Predicts Single-Trial Human Fixations in an Object Search Task Thomas Miconi Laura Groomes and Gabriel
Kreiman
Cerebral Cortex 2016
- See more at httpklabtchharvardeduresources
miconietal_visualsearch_2016htmlsthashKmHoBP
skxwHtrTkJdpuf
ObjectNet A new benchmark for object recognition (in prep) Andrei Barbu David Mayo Josh Tenenbaum Boris Katz
Existing object detection benchmarks overstate the performance of machines and understate the performance of humans We are creating a dataset that removes biases and shows that machines are far inferior to humans when detecting objects
Partially Occluded Hands B Myanganbayar C Mata G Dekel B Katz G Ben-Yosef A Barbu
A dataset of RGB images of hands holding objects and interacting with objects Measured human accuracy on reconstructing occluded portions of hands People are extremely good at this task while networks are at near chance-level performance
Summer Course at Woods Hole Our flagship initiative
Brains Minds amp Machines Summer Course An intensive three-week course gives advanced students a ldquodeeprdquo introduction to the problem of intelligence
A self-reproducing community of scholars is being formed ~gt300 applicants ~30 accepted
Sponsored fellowships by GoogleX Hidary Foundation + Fujitsu
Ellen Hildreth
Boris Katz
Gabriel Kreiman
Directors
Lizanne Distefano
Kenny Blum
Kathleen Sullivan
Kris Brewer
EAC May 2020
CBMM Summer Schoolbull Signature CBMM (EducationKnowledge Transfer)activity aimed at creating an intergenerationalcommunity around the scienceandengineeringofintelligence
bull Students reported strong influence of lecturesworkingonprojectsandinteractionsamongfacultyTArsquosandpeersontheirownthinkingandresearchdevelopment
understand how the brain works (then) make intelligent machines
WHY
Our vision and mission
Recent Success Stories in AI are based on RL and DL
DL and RL come from neuroscience
Minskyrsquos SNARC
RL
DL
We focus on the combination of neuroscience and engineering to make progress on the problem of intelligence because as in the recent past it is likely that several of the next breakthroughs in ML and AI are likely to come from neuroscienceANDengineering
Vision for the BMM SummerSchool
A quick recap of 40 of the last ~50 years of neuroscience and ML through my eyes
1972-2013
Tuebingen MPI fuer BK (1972-1981)
Werner Reichardtrsquos PhD
Werner with Dr Ruska (center) Photo dated Nov 17 1952 (courtesy B Reichardt)
The four directors of the MPI fuer Biologische Kybernetik
23
The beautiful eyes of flies
Work at 3 levels
bull Fixation and tracking behavior of the fly
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits
bull Biophysics of computation
Fixation and tracking behavior Reichardtrsquos closed loop flight simulator
26
Fixation and tracking behavior
Poggio T and W Reichardt A Theory of Pattern Induced Flight Orientation of the Fly Musca Domestica Kybernetik 12 185-203 1972
27
Cognition in flies probabilistic theories then (coming only now to humans)
The beginning of untethered flight analysis Buumllthoff Poggio amp Wehrhahn Z Naturforsch 35c 811-815 (1980)
most behavioral fly research was done with the Goumltz torque meter
in 1976 based on this recording technology Reichardt amp Poggio developed their theory for Visual control of orientation behaviour in the fly Part I +II Quart Rev Biophysics 9(3) 311-375
open question how well does this theory describe fly behavior of natural flight
in 1980 Wehrhan started high-speed film recording of flies chasing each other
single frame analysis 3D stereo reconstruction
Cognitive theory of basic fly instincts predicts trajectory of chasing fly hellip
Wehrhahn C T Poggio and H Buumllthoff Biological Cybernetics 45 123-130 1982
30
Cognition in flies
Geiger G and T Poggio The Muller-Lyer Figure and the Fly Science 190 479-480 1975
Work at 3 levels
bull Fixation and tracking behavior of the fly (cognition in the flyhellipsimilar to Bayesian approach to cognition in humanshellipno neurons)
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits
bull Biophysics of computation
Motion algorithm the beetle Clorophanus and Reichardtrsquos motion detector
Motion algorithm the beetle and the fly
bull The beetle follows the motion
bull Each photoreceptor sees only an alternation of dark and light how is motion computed
bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector
bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz
bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex
bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)
Relative motion and figure-ground discrimination the fly
Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)
Motion discontinuities and figure-ground discrimination neural circuitry
Towards the neural circuitry Reichardt Poggio Hausen 1983
36
Relative motion
Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003
Two of the neuronshellip
Work at 3 levels
bull Fixation and tracking behavior of the fly
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)
bull Biophysics of computation
39
Biophysics of computation (motion detection)
Biophysics of Computation
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
copy Nature Publishing Group1985
_____________________________________ ____________
Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch
Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA
Istituto di Fisica Universita di Genova Genova Italy
Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain
COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision
Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction
A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems
Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only
Examples of early vision processes
bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour
generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing
Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process
A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems
Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6
one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions
The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the
Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ
ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ
oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ
cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ
$3)135 51052+5 4amp-5 5 (5
13
UacuteѱKŏUdԛ ԛ
ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ
CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ
oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
Cooperativeneuralnetworkforstereo
~ 1979 T Poggio and D Marr MPI Tuebingen
)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz
1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg
nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2
sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2
0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2
3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2
6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2
sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2
=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2
U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2
$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2
$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2
$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2
$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2
$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2
$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2
$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2
$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2
$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2
sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2
$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2
sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2
sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2
13
13
4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z
PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2
13
+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz
D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz
0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2
3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2
6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2
sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2
)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2
sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2
sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2
$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2
sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2
sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2
0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2
3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2
6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2
sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2
=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2
Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2
pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz
UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z
Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2
=sup2
Cooperative Computation of Stereo Disparity
D Marr T Poggio
Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287
Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1
Science is currently published by American Association for the Advancement of Science
Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use
Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission
JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg
httpwwwjstororgMon Jan 22 124953 2007
Vision A Computational Investigation into the Human Representation and Processing of Visual Information
Foreword by Afterword by Tomaso Poggio
David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists
In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level
Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui
Visionwhatiswhere
A complex system must be understood at several different levels
Werner Reichardtrsquos scientific legacy Integrative Neuroscience
bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels
mdash computation mdash algorithms mdash biophysics and circuits
bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip
bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip
bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik
MIT (1981-)
43rd Stated Meeting of the NRP Associates March 14-17 1982
Learning theory + algorithms
Computational Neuroscience
models+experiments
ENGINEERING APPLICATIONS
bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor
How visual cortex works ndash and how it may suggest better computer vision
systems
2
1
1min ( ( ))i i Kf H i
V y f x fmicroisin
=
⎡ ⎤+⎢ ⎥
⎣ ⎦sum
Predictive regularization algorithms
Theorems on foundations of learning
MIT (1981-)
BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001
ON THE MATHEMATICAL FOUNDATIONS OF LEARNING
FELIPE CUCKER AND STEVE SMALE
The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial
T Poggio and CR Shelton
Introduction
(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear
We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of
languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-
ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])
(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice
Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)
Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In
Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University
grant No 8780043
c2001 American Mathematical Society
1
General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2
1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA
Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering
One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label
In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data
In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses
What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-
ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the
algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate
Box 1Formal definitions in supervised learning
Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example
n1lim jXn 2Xj 0 in probability) if and only if for every e 0
n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z
S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as
I$f
zVf zdmz
which is also the expected error of a new sample z drawn from thedistribution In the case of square loss
I$f
XYfx2 y2dmxy
We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S
IS$f 1
n
X
n
i1
Vf zi
Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m
n1lim jI$fS2 IS$fSj 0 in probability
An algorithm is (universally) consistent if uniformly for any distributionm and any e 0
n1lim P I$fSf2Hinf I$famp 1
0
letters to nature
NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group
Why do hierarchical architectures work
bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern
Sung amp Poggio 1995
~15 year old CBCL computer vision research face detection
since 2006 on the market (digital cameras)
Third Annual NSF Site Visit June 8 ndash 9 2016
Moore-like law for ML (1995-2018)
bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses
Visionwhatiswhere
bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)
ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex
Van Essen amp Anderson 1990
The ventral stream hierarchy V1 V2 V4 IT
A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to
position and scale changes
Kobatake amp Tanaka 1994
Visionventralstream
74
Cognition in people
Shape representation in the inferior temporalcortex of monkeys
Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during
view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view
Current Biology 1995 5552-563
Background
Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object
Most theories which postulate that transformations of animage representation precede matching assume either a
complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set
Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer
Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu
Current Biology 1995 Vol 5 No 5552
9520 spring 2003
Modelrsquos early predictions neurons become view-tuned during recognition
Poggio Edelman Riesenhuber (1990 2000)
Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995
A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)
Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007
Database collected by Oliva amp Torralba
Psychophysics of rapid categorization
Rapid categorization task (with mask to test feedforward model)
Animal present or not
30 ms ISI
20 ms
Image
Interval Image-Mask
Mask 1f noise
80 ms
Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005
Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)
Hierarchical feedforward models of the ventral stream
Decoding the neural code Matrix-like read-out from the brain
Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale
Hung Kreiman Poggio DiCarlo 2005
Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005
helliphelliphellip in 2013helliphellip
EAC- May 2020
CBMM Organizational Chart (future)
DirectorTomaso Poggio
EAC
Managing DirectorKathleen Sullivan
(MIT)
Education CoordinatorEllen Hildreth
(WC)
Education Evaluation
Lizanne DeStefano
(GT)
KT Coordinator
Boris Katz(MIT)
Diversity Coordinator
Mandana Sassanfar
(MIT)
Deputy Director
Gabriel Kreiman
(HU)
Associate Director amp Trainee
CoordinatorMatt Wilson
(MIT)
Research DirectorKenneth
Blum(HU)
AdministrativeAssistant
Technology Director
Module 1VISUAL
STREAMTomaso PoggioShimon Ullman
(MIT)
Module 2BRAIN OS
Gabriel Kreiman(HU)
Module 4TOWARDS SYMBOLSBoris Katz
Shimon Ullman(MIT)
Module 3COGNITIVE
CORENancy Kanwisher
Joshua Tenenbaum(MIT)
Jim DiCarlo
102030405060708090
100110120130140150
Facu
lty
Resea
rch Scie
ntist
Postdoc
s
Grad Stud
ents
Underg
rads
StaffO
ther
Total
Year 1Year 2Year 3Year 4Year 5Year 6(Year 7)
CBMM Participants
EAC
Demis Hassabis DeepMind
Charles Isbell Jr Georgia Tech
Christof Koch Allen Institute
Fei-Fei Li Stanford
Lore McGovern MIBR MIT
Joel Oppenheim NYU
Pietro Perona Caltech
Marc Raibert Boston DynamicsJudith Richter MedinolKobi Richter Medinol
Amnon Shashua Mobileye
David Siegel Two Sigma
Susan Whitehead MIT Corporation
Jim Pallotta The Raptor group
Research Education amp Diversity Partners
Boyden Desimone DiCarlo Kaelbling Kanwisher Katz McDermott Oliva Poggio Roy Sassanfar Saxe Schulz Tegmark Tenenbaum Ullman Wilson Torralba
Blum Gershman Kreiman Livingstone Sompolinsky Spelke
MIT Harvard
Chouika Manaye Rwebangira Salmani
Howard U
Hunter College
Isik
Johns Hopkins U
BrumbergQueens College
Chodorow Epstein Sakas Zeigler Freiwald
Rockefeller U
Stanford UJorquera
Universidad Central Del Caribe (UCC)
McNair Program
University of Central Florida
Goodman
Blaser Ciaramitaro Pomplun Shukla
UMass Boston UPR - Mayaguumlez UPR ndash Riacuteo Piedras
Hildreth Wiest WilmerWellesley College
Santiago Vega-Riveros Garcia-Arraras Maldonado-Vlaar Megret Ordoacutentildeez Ortiz-Zuazaga
Kreiman Livingstone
Harvard Medical School
FinlaysonFlorida International U
Kreiman
Boston Childrenrsquos Hospital
Museum of Science Boston
DeepMind
International and Corporate Partners
IITCingolani
ASTARChuan Poh Lim
Hebrew UWeiss
MPIBuumllthoff
Genoa UVerri Rosasco
WeizmannUllman
Sangwan Lee
IBM HondaMicrosoft
Boston Dynamics
Orcam NVIDIASiemens
Schlumberger Mobileye Intel
Fujitsu
GE
Kaist
Videos - ~950 (May 2014 - April 2020)
(of Youtube subscribers only - 18 of viewers)
Ellen Hildreth
Mandana Sassanfar
Diversity Program
EAC- May 2020
Code Software and Datasets
Therersquos Waldo A Normalization Model of Visual Search Predicts Single-Trial Human Fixations in an Object Search Task Thomas Miconi Laura Groomes and Gabriel
Kreiman
Cerebral Cortex 2016
- See more at httpklabtchharvardeduresources
miconietal_visualsearch_2016htmlsthashKmHoBP
skxwHtrTkJdpuf
ObjectNet A new benchmark for object recognition (in prep) Andrei Barbu David Mayo Josh Tenenbaum Boris Katz
Existing object detection benchmarks overstate the performance of machines and understate the performance of humans We are creating a dataset that removes biases and shows that machines are far inferior to humans when detecting objects
Partially Occluded Hands B Myanganbayar C Mata G Dekel B Katz G Ben-Yosef A Barbu
A dataset of RGB images of hands holding objects and interacting with objects Measured human accuracy on reconstructing occluded portions of hands People are extremely good at this task while networks are at near chance-level performance
Summer Course at Woods Hole Our flagship initiative
Brains Minds amp Machines Summer Course An intensive three-week course gives advanced students a ldquodeeprdquo introduction to the problem of intelligence
A self-reproducing community of scholars is being formed ~gt300 applicants ~30 accepted
Sponsored fellowships by GoogleX Hidary Foundation + Fujitsu
Ellen Hildreth
Boris Katz
Gabriel Kreiman
Directors
Lizanne Distefano
Kenny Blum
Kathleen Sullivan
Kris Brewer
EAC May 2020
CBMM Summer Schoolbull Signature CBMM (EducationKnowledge Transfer)activity aimed at creating an intergenerationalcommunity around the scienceandengineeringofintelligence
bull Students reported strong influence of lecturesworkingonprojectsandinteractionsamongfacultyTArsquosandpeersontheirownthinkingandresearchdevelopment
understand how the brain works (then) make intelligent machines
WHY
Our vision and mission
Recent Success Stories in AI are based on RL and DL
DL and RL come from neuroscience
Minskyrsquos SNARC
RL
DL
We focus on the combination of neuroscience and engineering to make progress on the problem of intelligence because as in the recent past it is likely that several of the next breakthroughs in ML and AI are likely to come from neuroscienceANDengineering
Vision for the BMM SummerSchool
A quick recap of 40 of the last ~50 years of neuroscience and ML through my eyes
1972-2013
Tuebingen MPI fuer BK (1972-1981)
Werner Reichardtrsquos PhD
Werner with Dr Ruska (center) Photo dated Nov 17 1952 (courtesy B Reichardt)
The four directors of the MPI fuer Biologische Kybernetik
23
The beautiful eyes of flies
Work at 3 levels
bull Fixation and tracking behavior of the fly
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits
bull Biophysics of computation
Fixation and tracking behavior Reichardtrsquos closed loop flight simulator
26
Fixation and tracking behavior
Poggio T and W Reichardt A Theory of Pattern Induced Flight Orientation of the Fly Musca Domestica Kybernetik 12 185-203 1972
27
Cognition in flies probabilistic theories then (coming only now to humans)
The beginning of untethered flight analysis Buumllthoff Poggio amp Wehrhahn Z Naturforsch 35c 811-815 (1980)
most behavioral fly research was done with the Goumltz torque meter
in 1976 based on this recording technology Reichardt amp Poggio developed their theory for Visual control of orientation behaviour in the fly Part I +II Quart Rev Biophysics 9(3) 311-375
open question how well does this theory describe fly behavior of natural flight
in 1980 Wehrhan started high-speed film recording of flies chasing each other
single frame analysis 3D stereo reconstruction
Cognitive theory of basic fly instincts predicts trajectory of chasing fly hellip
Wehrhahn C T Poggio and H Buumllthoff Biological Cybernetics 45 123-130 1982
30
Cognition in flies
Geiger G and T Poggio The Muller-Lyer Figure and the Fly Science 190 479-480 1975
Work at 3 levels
bull Fixation and tracking behavior of the fly (cognition in the flyhellipsimilar to Bayesian approach to cognition in humanshellipno neurons)
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits
bull Biophysics of computation
Motion algorithm the beetle Clorophanus and Reichardtrsquos motion detector
Motion algorithm the beetle and the fly
bull The beetle follows the motion
bull Each photoreceptor sees only an alternation of dark and light how is motion computed
bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector
bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz
bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex
bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)
Relative motion and figure-ground discrimination the fly
Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)
Motion discontinuities and figure-ground discrimination neural circuitry
Towards the neural circuitry Reichardt Poggio Hausen 1983
36
Relative motion
Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003
Two of the neuronshellip
Work at 3 levels
bull Fixation and tracking behavior of the fly
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)
bull Biophysics of computation
39
Biophysics of computation (motion detection)
Biophysics of Computation
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
copy Nature Publishing Group1985
_____________________________________ ____________
Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch
Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA
Istituto di Fisica Universita di Genova Genova Italy
Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain
COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision
Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction
A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems
Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only
Examples of early vision processes
bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour
generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing
Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process
A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems
Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6
one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions
The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the
Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ
ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ
oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ
cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ
$3)135 51052+5 4amp-5 5 (5
13
UacuteѱKŏUdԛ ԛ
ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ
CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ
oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
Cooperativeneuralnetworkforstereo
~ 1979 T Poggio and D Marr MPI Tuebingen
)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz
1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg
nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2
sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2
0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2
3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2
6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2
sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2
=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2
U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2
$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2
$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2
$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2
$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2
$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2
$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2
$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2
$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2
$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2
sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2
$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2
sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2
sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2
13
13
4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z
PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2
13
+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz
D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz
0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2
3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2
6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2
sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2
)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2
sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2
sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2
$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2
sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2
sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2
0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2
3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2
6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2
sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2
=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2
Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2
pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz
UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z
Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2
=sup2
Cooperative Computation of Stereo Disparity
D Marr T Poggio
Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287
Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1
Science is currently published by American Association for the Advancement of Science
Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use
Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission
JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg
httpwwwjstororgMon Jan 22 124953 2007
Vision A Computational Investigation into the Human Representation and Processing of Visual Information
Foreword by Afterword by Tomaso Poggio
David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists
In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level
Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui
Visionwhatiswhere
A complex system must be understood at several different levels
Werner Reichardtrsquos scientific legacy Integrative Neuroscience
bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels
mdash computation mdash algorithms mdash biophysics and circuits
bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip
bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip
bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik
MIT (1981-)
43rd Stated Meeting of the NRP Associates March 14-17 1982
Learning theory + algorithms
Computational Neuroscience
models+experiments
ENGINEERING APPLICATIONS
bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor
How visual cortex works ndash and how it may suggest better computer vision
systems
2
1
1min ( ( ))i i Kf H i
V y f x fmicroisin
=
⎡ ⎤+⎢ ⎥
⎣ ⎦sum
Predictive regularization algorithms
Theorems on foundations of learning
MIT (1981-)
BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001
ON THE MATHEMATICAL FOUNDATIONS OF LEARNING
FELIPE CUCKER AND STEVE SMALE
The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial
T Poggio and CR Shelton
Introduction
(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear
We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of
languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-
ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])
(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice
Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)
Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In
Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University
grant No 8780043
c2001 American Mathematical Society
1
General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2
1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA
Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering
One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label
In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data
In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses
What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-
ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the
algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate
Box 1Formal definitions in supervised learning
Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example
n1lim jXn 2Xj 0 in probability) if and only if for every e 0
n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z
S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as
I$f
zVf zdmz
which is also the expected error of a new sample z drawn from thedistribution In the case of square loss
I$f
XYfx2 y2dmxy
We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S
IS$f 1
n
X
n
i1
Vf zi
Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m
n1lim jI$fS2 IS$fSj 0 in probability
An algorithm is (universally) consistent if uniformly for any distributionm and any e 0
n1lim P I$fSf2Hinf I$famp 1
0
letters to nature
NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group
Why do hierarchical architectures work
bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern
Sung amp Poggio 1995
~15 year old CBCL computer vision research face detection
since 2006 on the market (digital cameras)
Third Annual NSF Site Visit June 8 ndash 9 2016
Moore-like law for ML (1995-2018)
bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses
Visionwhatiswhere
bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)
ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex
Van Essen amp Anderson 1990
The ventral stream hierarchy V1 V2 V4 IT
A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to
position and scale changes
Kobatake amp Tanaka 1994
Visionventralstream
74
Cognition in people
Shape representation in the inferior temporalcortex of monkeys
Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during
view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view
Current Biology 1995 5552-563
Background
Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object
Most theories which postulate that transformations of animage representation precede matching assume either a
complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set
Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer
Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu
Current Biology 1995 Vol 5 No 5552
9520 spring 2003
Modelrsquos early predictions neurons become view-tuned during recognition
Poggio Edelman Riesenhuber (1990 2000)
Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995
A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)
Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007
Database collected by Oliva amp Torralba
Psychophysics of rapid categorization
Rapid categorization task (with mask to test feedforward model)
Animal present or not
30 ms ISI
20 ms
Image
Interval Image-Mask
Mask 1f noise
80 ms
Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005
Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)
Hierarchical feedforward models of the ventral stream
Decoding the neural code Matrix-like read-out from the brain
Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale
Hung Kreiman Poggio DiCarlo 2005
Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005
helliphelliphellip in 2013helliphellip
102030405060708090
100110120130140150
Facu
lty
Resea
rch Scie
ntist
Postdoc
s
Grad Stud
ents
Underg
rads
StaffO
ther
Total
Year 1Year 2Year 3Year 4Year 5Year 6(Year 7)
CBMM Participants
EAC
Demis Hassabis DeepMind
Charles Isbell Jr Georgia Tech
Christof Koch Allen Institute
Fei-Fei Li Stanford
Lore McGovern MIBR MIT
Joel Oppenheim NYU
Pietro Perona Caltech
Marc Raibert Boston DynamicsJudith Richter MedinolKobi Richter Medinol
Amnon Shashua Mobileye
David Siegel Two Sigma
Susan Whitehead MIT Corporation
Jim Pallotta The Raptor group
Research Education amp Diversity Partners
Boyden Desimone DiCarlo Kaelbling Kanwisher Katz McDermott Oliva Poggio Roy Sassanfar Saxe Schulz Tegmark Tenenbaum Ullman Wilson Torralba
Blum Gershman Kreiman Livingstone Sompolinsky Spelke
MIT Harvard
Chouika Manaye Rwebangira Salmani
Howard U
Hunter College
Isik
Johns Hopkins U
BrumbergQueens College
Chodorow Epstein Sakas Zeigler Freiwald
Rockefeller U
Stanford UJorquera
Universidad Central Del Caribe (UCC)
McNair Program
University of Central Florida
Goodman
Blaser Ciaramitaro Pomplun Shukla
UMass Boston UPR - Mayaguumlez UPR ndash Riacuteo Piedras
Hildreth Wiest WilmerWellesley College
Santiago Vega-Riveros Garcia-Arraras Maldonado-Vlaar Megret Ordoacutentildeez Ortiz-Zuazaga
Kreiman Livingstone
Harvard Medical School
FinlaysonFlorida International U
Kreiman
Boston Childrenrsquos Hospital
Museum of Science Boston
DeepMind
International and Corporate Partners
IITCingolani
ASTARChuan Poh Lim
Hebrew UWeiss
MPIBuumllthoff
Genoa UVerri Rosasco
WeizmannUllman
Sangwan Lee
IBM HondaMicrosoft
Boston Dynamics
Orcam NVIDIASiemens
Schlumberger Mobileye Intel
Fujitsu
GE
Kaist
Videos - ~950 (May 2014 - April 2020)
(of Youtube subscribers only - 18 of viewers)
Ellen Hildreth
Mandana Sassanfar
Diversity Program
EAC- May 2020
Code Software and Datasets
Therersquos Waldo A Normalization Model of Visual Search Predicts Single-Trial Human Fixations in an Object Search Task Thomas Miconi Laura Groomes and Gabriel
Kreiman
Cerebral Cortex 2016
- See more at httpklabtchharvardeduresources
miconietal_visualsearch_2016htmlsthashKmHoBP
skxwHtrTkJdpuf
ObjectNet A new benchmark for object recognition (in prep) Andrei Barbu David Mayo Josh Tenenbaum Boris Katz
Existing object detection benchmarks overstate the performance of machines and understate the performance of humans We are creating a dataset that removes biases and shows that machines are far inferior to humans when detecting objects
Partially Occluded Hands B Myanganbayar C Mata G Dekel B Katz G Ben-Yosef A Barbu
A dataset of RGB images of hands holding objects and interacting with objects Measured human accuracy on reconstructing occluded portions of hands People are extremely good at this task while networks are at near chance-level performance
Summer Course at Woods Hole Our flagship initiative
Brains Minds amp Machines Summer Course An intensive three-week course gives advanced students a ldquodeeprdquo introduction to the problem of intelligence
A self-reproducing community of scholars is being formed ~gt300 applicants ~30 accepted
Sponsored fellowships by GoogleX Hidary Foundation + Fujitsu
Ellen Hildreth
Boris Katz
Gabriel Kreiman
Directors
Lizanne Distefano
Kenny Blum
Kathleen Sullivan
Kris Brewer
EAC May 2020
CBMM Summer Schoolbull Signature CBMM (EducationKnowledge Transfer)activity aimed at creating an intergenerationalcommunity around the scienceandengineeringofintelligence
bull Students reported strong influence of lecturesworkingonprojectsandinteractionsamongfacultyTArsquosandpeersontheirownthinkingandresearchdevelopment
understand how the brain works (then) make intelligent machines
WHY
Our vision and mission
Recent Success Stories in AI are based on RL and DL
DL and RL come from neuroscience
Minskyrsquos SNARC
RL
DL
We focus on the combination of neuroscience and engineering to make progress on the problem of intelligence because as in the recent past it is likely that several of the next breakthroughs in ML and AI are likely to come from neuroscienceANDengineering
Vision for the BMM SummerSchool
A quick recap of 40 of the last ~50 years of neuroscience and ML through my eyes
1972-2013
Tuebingen MPI fuer BK (1972-1981)
Werner Reichardtrsquos PhD
Werner with Dr Ruska (center) Photo dated Nov 17 1952 (courtesy B Reichardt)
The four directors of the MPI fuer Biologische Kybernetik
23
The beautiful eyes of flies
Work at 3 levels
bull Fixation and tracking behavior of the fly
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits
bull Biophysics of computation
Fixation and tracking behavior Reichardtrsquos closed loop flight simulator
26
Fixation and tracking behavior
Poggio T and W Reichardt A Theory of Pattern Induced Flight Orientation of the Fly Musca Domestica Kybernetik 12 185-203 1972
27
Cognition in flies probabilistic theories then (coming only now to humans)
The beginning of untethered flight analysis Buumllthoff Poggio amp Wehrhahn Z Naturforsch 35c 811-815 (1980)
most behavioral fly research was done with the Goumltz torque meter
in 1976 based on this recording technology Reichardt amp Poggio developed their theory for Visual control of orientation behaviour in the fly Part I +II Quart Rev Biophysics 9(3) 311-375
open question how well does this theory describe fly behavior of natural flight
in 1980 Wehrhan started high-speed film recording of flies chasing each other
single frame analysis 3D stereo reconstruction
Cognitive theory of basic fly instincts predicts trajectory of chasing fly hellip
Wehrhahn C T Poggio and H Buumllthoff Biological Cybernetics 45 123-130 1982
30
Cognition in flies
Geiger G and T Poggio The Muller-Lyer Figure and the Fly Science 190 479-480 1975
Work at 3 levels
bull Fixation and tracking behavior of the fly (cognition in the flyhellipsimilar to Bayesian approach to cognition in humanshellipno neurons)
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits
bull Biophysics of computation
Motion algorithm the beetle Clorophanus and Reichardtrsquos motion detector
Motion algorithm the beetle and the fly
bull The beetle follows the motion
bull Each photoreceptor sees only an alternation of dark and light how is motion computed
bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector
bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz
bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex
bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)
Relative motion and figure-ground discrimination the fly
Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)
Motion discontinuities and figure-ground discrimination neural circuitry
Towards the neural circuitry Reichardt Poggio Hausen 1983
36
Relative motion
Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003
Two of the neuronshellip
Work at 3 levels
bull Fixation and tracking behavior of the fly
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)
bull Biophysics of computation
39
Biophysics of computation (motion detection)
Biophysics of Computation
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
copy Nature Publishing Group1985
_____________________________________ ____________
Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch
Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA
Istituto di Fisica Universita di Genova Genova Italy
Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain
COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision
Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction
A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems
Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only
Examples of early vision processes
bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour
generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing
Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process
A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems
Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6
one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions
The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the
Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ
ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ
oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ
cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ
$3)135 51052+5 4amp-5 5 (5
13
UacuteѱKŏUdԛ ԛ
ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ
CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ
oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
Cooperativeneuralnetworkforstereo
~ 1979 T Poggio and D Marr MPI Tuebingen
)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz
1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg
nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2
sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2
0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2
3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2
6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2
sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2
=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2
U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2
$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2
$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2
$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2
$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2
$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2
$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2
$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2
$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2
$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2
sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2
$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2
sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2
sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2
13
13
4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z
PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2
13
+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz
D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz
0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2
3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2
6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2
sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2
)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2
sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2
sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2
$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2
sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2
sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2
0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2
3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2
6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2
sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2
=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2
Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2
pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz
UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z
Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2
=sup2
Cooperative Computation of Stereo Disparity
D Marr T Poggio
Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287
Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1
Science is currently published by American Association for the Advancement of Science
Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use
Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission
JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg
httpwwwjstororgMon Jan 22 124953 2007
Vision A Computational Investigation into the Human Representation and Processing of Visual Information
Foreword by Afterword by Tomaso Poggio
David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists
In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level
Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui
Visionwhatiswhere
A complex system must be understood at several different levels
Werner Reichardtrsquos scientific legacy Integrative Neuroscience
bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels
mdash computation mdash algorithms mdash biophysics and circuits
bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip
bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip
bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik
MIT (1981-)
43rd Stated Meeting of the NRP Associates March 14-17 1982
Learning theory + algorithms
Computational Neuroscience
models+experiments
ENGINEERING APPLICATIONS
bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor
How visual cortex works ndash and how it may suggest better computer vision
systems
2
1
1min ( ( ))i i Kf H i
V y f x fmicroisin
=
⎡ ⎤+⎢ ⎥
⎣ ⎦sum
Predictive regularization algorithms
Theorems on foundations of learning
MIT (1981-)
BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001
ON THE MATHEMATICAL FOUNDATIONS OF LEARNING
FELIPE CUCKER AND STEVE SMALE
The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial
T Poggio and CR Shelton
Introduction
(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear
We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of
languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-
ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])
(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice
Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)
Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In
Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University
grant No 8780043
c2001 American Mathematical Society
1
General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2
1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA
Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering
One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label
In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data
In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses
What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-
ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the
algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate
Box 1Formal definitions in supervised learning
Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example
n1lim jXn 2Xj 0 in probability) if and only if for every e 0
n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z
S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as
I$f
zVf zdmz
which is also the expected error of a new sample z drawn from thedistribution In the case of square loss
I$f
XYfx2 y2dmxy
We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S
IS$f 1
n
X
n
i1
Vf zi
Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m
n1lim jI$fS2 IS$fSj 0 in probability
An algorithm is (universally) consistent if uniformly for any distributionm and any e 0
n1lim P I$fSf2Hinf I$famp 1
0
letters to nature
NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group
Why do hierarchical architectures work
bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern
Sung amp Poggio 1995
~15 year old CBCL computer vision research face detection
since 2006 on the market (digital cameras)
Third Annual NSF Site Visit June 8 ndash 9 2016
Moore-like law for ML (1995-2018)
bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses
Visionwhatiswhere
bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)
ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex
Van Essen amp Anderson 1990
The ventral stream hierarchy V1 V2 V4 IT
A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to
position and scale changes
Kobatake amp Tanaka 1994
Visionventralstream
74
Cognition in people
Shape representation in the inferior temporalcortex of monkeys
Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during
view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view
Current Biology 1995 5552-563
Background
Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object
Most theories which postulate that transformations of animage representation precede matching assume either a
complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set
Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer
Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu
Current Biology 1995 Vol 5 No 5552
9520 spring 2003
Modelrsquos early predictions neurons become view-tuned during recognition
Poggio Edelman Riesenhuber (1990 2000)
Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995
A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)
Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007
Database collected by Oliva amp Torralba
Psychophysics of rapid categorization
Rapid categorization task (with mask to test feedforward model)
Animal present or not
30 ms ISI
20 ms
Image
Interval Image-Mask
Mask 1f noise
80 ms
Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005
Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)
Hierarchical feedforward models of the ventral stream
Decoding the neural code Matrix-like read-out from the brain
Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale
Hung Kreiman Poggio DiCarlo 2005
Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005
helliphelliphellip in 2013helliphellip
EAC
Demis Hassabis DeepMind
Charles Isbell Jr Georgia Tech
Christof Koch Allen Institute
Fei-Fei Li Stanford
Lore McGovern MIBR MIT
Joel Oppenheim NYU
Pietro Perona Caltech
Marc Raibert Boston DynamicsJudith Richter MedinolKobi Richter Medinol
Amnon Shashua Mobileye
David Siegel Two Sigma
Susan Whitehead MIT Corporation
Jim Pallotta The Raptor group
Research Education amp Diversity Partners
Boyden Desimone DiCarlo Kaelbling Kanwisher Katz McDermott Oliva Poggio Roy Sassanfar Saxe Schulz Tegmark Tenenbaum Ullman Wilson Torralba
Blum Gershman Kreiman Livingstone Sompolinsky Spelke
MIT Harvard
Chouika Manaye Rwebangira Salmani
Howard U
Hunter College
Isik
Johns Hopkins U
BrumbergQueens College
Chodorow Epstein Sakas Zeigler Freiwald
Rockefeller U
Stanford UJorquera
Universidad Central Del Caribe (UCC)
McNair Program
University of Central Florida
Goodman
Blaser Ciaramitaro Pomplun Shukla
UMass Boston UPR - Mayaguumlez UPR ndash Riacuteo Piedras
Hildreth Wiest WilmerWellesley College
Santiago Vega-Riveros Garcia-Arraras Maldonado-Vlaar Megret Ordoacutentildeez Ortiz-Zuazaga
Kreiman Livingstone
Harvard Medical School
FinlaysonFlorida International U
Kreiman
Boston Childrenrsquos Hospital
Museum of Science Boston
DeepMind
International and Corporate Partners
IITCingolani
ASTARChuan Poh Lim
Hebrew UWeiss
MPIBuumllthoff
Genoa UVerri Rosasco
WeizmannUllman
Sangwan Lee
IBM HondaMicrosoft
Boston Dynamics
Orcam NVIDIASiemens
Schlumberger Mobileye Intel
Fujitsu
GE
Kaist
Videos - ~950 (May 2014 - April 2020)
(of Youtube subscribers only - 18 of viewers)
Ellen Hildreth
Mandana Sassanfar
Diversity Program
EAC- May 2020
Code Software and Datasets
Therersquos Waldo A Normalization Model of Visual Search Predicts Single-Trial Human Fixations in an Object Search Task Thomas Miconi Laura Groomes and Gabriel
Kreiman
Cerebral Cortex 2016
- See more at httpklabtchharvardeduresources
miconietal_visualsearch_2016htmlsthashKmHoBP
skxwHtrTkJdpuf
ObjectNet A new benchmark for object recognition (in prep) Andrei Barbu David Mayo Josh Tenenbaum Boris Katz
Existing object detection benchmarks overstate the performance of machines and understate the performance of humans We are creating a dataset that removes biases and shows that machines are far inferior to humans when detecting objects
Partially Occluded Hands B Myanganbayar C Mata G Dekel B Katz G Ben-Yosef A Barbu
A dataset of RGB images of hands holding objects and interacting with objects Measured human accuracy on reconstructing occluded portions of hands People are extremely good at this task while networks are at near chance-level performance
Summer Course at Woods Hole Our flagship initiative
Brains Minds amp Machines Summer Course An intensive three-week course gives advanced students a ldquodeeprdquo introduction to the problem of intelligence
A self-reproducing community of scholars is being formed ~gt300 applicants ~30 accepted
Sponsored fellowships by GoogleX Hidary Foundation + Fujitsu
Ellen Hildreth
Boris Katz
Gabriel Kreiman
Directors
Lizanne Distefano
Kenny Blum
Kathleen Sullivan
Kris Brewer
EAC May 2020
CBMM Summer Schoolbull Signature CBMM (EducationKnowledge Transfer)activity aimed at creating an intergenerationalcommunity around the scienceandengineeringofintelligence
bull Students reported strong influence of lecturesworkingonprojectsandinteractionsamongfacultyTArsquosandpeersontheirownthinkingandresearchdevelopment
understand how the brain works (then) make intelligent machines
WHY
Our vision and mission
Recent Success Stories in AI are based on RL and DL
DL and RL come from neuroscience
Minskyrsquos SNARC
RL
DL
We focus on the combination of neuroscience and engineering to make progress on the problem of intelligence because as in the recent past it is likely that several of the next breakthroughs in ML and AI are likely to come from neuroscienceANDengineering
Vision for the BMM SummerSchool
A quick recap of 40 of the last ~50 years of neuroscience and ML through my eyes
1972-2013
Tuebingen MPI fuer BK (1972-1981)
Werner Reichardtrsquos PhD
Werner with Dr Ruska (center) Photo dated Nov 17 1952 (courtesy B Reichardt)
The four directors of the MPI fuer Biologische Kybernetik
23
The beautiful eyes of flies
Work at 3 levels
bull Fixation and tracking behavior of the fly
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits
bull Biophysics of computation
Fixation and tracking behavior Reichardtrsquos closed loop flight simulator
26
Fixation and tracking behavior
Poggio T and W Reichardt A Theory of Pattern Induced Flight Orientation of the Fly Musca Domestica Kybernetik 12 185-203 1972
27
Cognition in flies probabilistic theories then (coming only now to humans)
The beginning of untethered flight analysis Buumllthoff Poggio amp Wehrhahn Z Naturforsch 35c 811-815 (1980)
most behavioral fly research was done with the Goumltz torque meter
in 1976 based on this recording technology Reichardt amp Poggio developed their theory for Visual control of orientation behaviour in the fly Part I +II Quart Rev Biophysics 9(3) 311-375
open question how well does this theory describe fly behavior of natural flight
in 1980 Wehrhan started high-speed film recording of flies chasing each other
single frame analysis 3D stereo reconstruction
Cognitive theory of basic fly instincts predicts trajectory of chasing fly hellip
Wehrhahn C T Poggio and H Buumllthoff Biological Cybernetics 45 123-130 1982
30
Cognition in flies
Geiger G and T Poggio The Muller-Lyer Figure and the Fly Science 190 479-480 1975
Work at 3 levels
bull Fixation and tracking behavior of the fly (cognition in the flyhellipsimilar to Bayesian approach to cognition in humanshellipno neurons)
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits
bull Biophysics of computation
Motion algorithm the beetle Clorophanus and Reichardtrsquos motion detector
Motion algorithm the beetle and the fly
bull The beetle follows the motion
bull Each photoreceptor sees only an alternation of dark and light how is motion computed
bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector
bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz
bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex
bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)
Relative motion and figure-ground discrimination the fly
Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)
Motion discontinuities and figure-ground discrimination neural circuitry
Towards the neural circuitry Reichardt Poggio Hausen 1983
36
Relative motion
Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003
Two of the neuronshellip
Work at 3 levels
bull Fixation and tracking behavior of the fly
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)
bull Biophysics of computation
39
Biophysics of computation (motion detection)
Biophysics of Computation
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
copy Nature Publishing Group1985
_____________________________________ ____________
Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch
Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA
Istituto di Fisica Universita di Genova Genova Italy
Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain
COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision
Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction
A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems
Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only
Examples of early vision processes
bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour
generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing
Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process
A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems
Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6
one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions
The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the
Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ
ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ
oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ
cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ
$3)135 51052+5 4amp-5 5 (5
13
UacuteѱKŏUdԛ ԛ
ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ
CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ
oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
Cooperativeneuralnetworkforstereo
~ 1979 T Poggio and D Marr MPI Tuebingen
)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz
1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg
nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2
sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2
0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2
3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2
6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2
sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2
=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2
U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2
$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2
$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2
$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2
$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2
$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2
$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2
$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2
$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2
$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2
sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2
$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2
sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2
sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2
13
13
4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z
PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2
13
+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz
D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz
0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2
3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2
6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2
sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2
)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2
sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2
sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2
$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2
sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2
sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2
0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2
3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2
6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2
sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2
=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2
Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2
pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz
UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z
Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2
=sup2
Cooperative Computation of Stereo Disparity
D Marr T Poggio
Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287
Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1
Science is currently published by American Association for the Advancement of Science
Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use
Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission
JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg
httpwwwjstororgMon Jan 22 124953 2007
Vision A Computational Investigation into the Human Representation and Processing of Visual Information
Foreword by Afterword by Tomaso Poggio
David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists
In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level
Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui
Visionwhatiswhere
A complex system must be understood at several different levels
Werner Reichardtrsquos scientific legacy Integrative Neuroscience
bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels
mdash computation mdash algorithms mdash biophysics and circuits
bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip
bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip
bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik
MIT (1981-)
43rd Stated Meeting of the NRP Associates March 14-17 1982
Learning theory + algorithms
Computational Neuroscience
models+experiments
ENGINEERING APPLICATIONS
bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor
How visual cortex works ndash and how it may suggest better computer vision
systems
2
1
1min ( ( ))i i Kf H i
V y f x fmicroisin
=
⎡ ⎤+⎢ ⎥
⎣ ⎦sum
Predictive regularization algorithms
Theorems on foundations of learning
MIT (1981-)
BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001
ON THE MATHEMATICAL FOUNDATIONS OF LEARNING
FELIPE CUCKER AND STEVE SMALE
The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial
T Poggio and CR Shelton
Introduction
(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear
We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of
languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-
ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])
(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice
Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)
Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In
Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University
grant No 8780043
c2001 American Mathematical Society
1
General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2
1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA
Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering
One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label
In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data
In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses
What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-
ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the
algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate
Box 1Formal definitions in supervised learning
Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example
n1lim jXn 2Xj 0 in probability) if and only if for every e 0
n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z
S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as
I$f
zVf zdmz
which is also the expected error of a new sample z drawn from thedistribution In the case of square loss
I$f
XYfx2 y2dmxy
We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S
IS$f 1
n
X
n
i1
Vf zi
Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m
n1lim jI$fS2 IS$fSj 0 in probability
An algorithm is (universally) consistent if uniformly for any distributionm and any e 0
n1lim P I$fSf2Hinf I$famp 1
0
letters to nature
NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group
Why do hierarchical architectures work
bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern
Sung amp Poggio 1995
~15 year old CBCL computer vision research face detection
since 2006 on the market (digital cameras)
Third Annual NSF Site Visit June 8 ndash 9 2016
Moore-like law for ML (1995-2018)
bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses
Visionwhatiswhere
bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)
ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex
Van Essen amp Anderson 1990
The ventral stream hierarchy V1 V2 V4 IT
A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to
position and scale changes
Kobatake amp Tanaka 1994
Visionventralstream
74
Cognition in people
Shape representation in the inferior temporalcortex of monkeys
Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during
view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view
Current Biology 1995 5552-563
Background
Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object
Most theories which postulate that transformations of animage representation precede matching assume either a
complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set
Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer
Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu
Current Biology 1995 Vol 5 No 5552
9520 spring 2003
Modelrsquos early predictions neurons become view-tuned during recognition
Poggio Edelman Riesenhuber (1990 2000)
Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995
A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)
Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007
Database collected by Oliva amp Torralba
Psychophysics of rapid categorization
Rapid categorization task (with mask to test feedforward model)
Animal present or not
30 ms ISI
20 ms
Image
Interval Image-Mask
Mask 1f noise
80 ms
Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005
Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)
Hierarchical feedforward models of the ventral stream
Decoding the neural code Matrix-like read-out from the brain
Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale
Hung Kreiman Poggio DiCarlo 2005
Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005
helliphelliphellip in 2013helliphellip
Research Education amp Diversity Partners
Boyden Desimone DiCarlo Kaelbling Kanwisher Katz McDermott Oliva Poggio Roy Sassanfar Saxe Schulz Tegmark Tenenbaum Ullman Wilson Torralba
Blum Gershman Kreiman Livingstone Sompolinsky Spelke
MIT Harvard
Chouika Manaye Rwebangira Salmani
Howard U
Hunter College
Isik
Johns Hopkins U
BrumbergQueens College
Chodorow Epstein Sakas Zeigler Freiwald
Rockefeller U
Stanford UJorquera
Universidad Central Del Caribe (UCC)
McNair Program
University of Central Florida
Goodman
Blaser Ciaramitaro Pomplun Shukla
UMass Boston UPR - Mayaguumlez UPR ndash Riacuteo Piedras
Hildreth Wiest WilmerWellesley College
Santiago Vega-Riveros Garcia-Arraras Maldonado-Vlaar Megret Ordoacutentildeez Ortiz-Zuazaga
Kreiman Livingstone
Harvard Medical School
FinlaysonFlorida International U
Kreiman
Boston Childrenrsquos Hospital
Museum of Science Boston
DeepMind
International and Corporate Partners
IITCingolani
ASTARChuan Poh Lim
Hebrew UWeiss
MPIBuumllthoff
Genoa UVerri Rosasco
WeizmannUllman
Sangwan Lee
IBM HondaMicrosoft
Boston Dynamics
Orcam NVIDIASiemens
Schlumberger Mobileye Intel
Fujitsu
GE
Kaist
Videos - ~950 (May 2014 - April 2020)
(of Youtube subscribers only - 18 of viewers)
Ellen Hildreth
Mandana Sassanfar
Diversity Program
EAC- May 2020
Code Software and Datasets
Therersquos Waldo A Normalization Model of Visual Search Predicts Single-Trial Human Fixations in an Object Search Task Thomas Miconi Laura Groomes and Gabriel
Kreiman
Cerebral Cortex 2016
- See more at httpklabtchharvardeduresources
miconietal_visualsearch_2016htmlsthashKmHoBP
skxwHtrTkJdpuf
ObjectNet A new benchmark for object recognition (in prep) Andrei Barbu David Mayo Josh Tenenbaum Boris Katz
Existing object detection benchmarks overstate the performance of machines and understate the performance of humans We are creating a dataset that removes biases and shows that machines are far inferior to humans when detecting objects
Partially Occluded Hands B Myanganbayar C Mata G Dekel B Katz G Ben-Yosef A Barbu
A dataset of RGB images of hands holding objects and interacting with objects Measured human accuracy on reconstructing occluded portions of hands People are extremely good at this task while networks are at near chance-level performance
Summer Course at Woods Hole Our flagship initiative
Brains Minds amp Machines Summer Course An intensive three-week course gives advanced students a ldquodeeprdquo introduction to the problem of intelligence
A self-reproducing community of scholars is being formed ~gt300 applicants ~30 accepted
Sponsored fellowships by GoogleX Hidary Foundation + Fujitsu
Ellen Hildreth
Boris Katz
Gabriel Kreiman
Directors
Lizanne Distefano
Kenny Blum
Kathleen Sullivan
Kris Brewer
EAC May 2020
CBMM Summer Schoolbull Signature CBMM (EducationKnowledge Transfer)activity aimed at creating an intergenerationalcommunity around the scienceandengineeringofintelligence
bull Students reported strong influence of lecturesworkingonprojectsandinteractionsamongfacultyTArsquosandpeersontheirownthinkingandresearchdevelopment
understand how the brain works (then) make intelligent machines
WHY
Our vision and mission
Recent Success Stories in AI are based on RL and DL
DL and RL come from neuroscience
Minskyrsquos SNARC
RL
DL
We focus on the combination of neuroscience and engineering to make progress on the problem of intelligence because as in the recent past it is likely that several of the next breakthroughs in ML and AI are likely to come from neuroscienceANDengineering
Vision for the BMM SummerSchool
A quick recap of 40 of the last ~50 years of neuroscience and ML through my eyes
1972-2013
Tuebingen MPI fuer BK (1972-1981)
Werner Reichardtrsquos PhD
Werner with Dr Ruska (center) Photo dated Nov 17 1952 (courtesy B Reichardt)
The four directors of the MPI fuer Biologische Kybernetik
23
The beautiful eyes of flies
Work at 3 levels
bull Fixation and tracking behavior of the fly
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits
bull Biophysics of computation
Fixation and tracking behavior Reichardtrsquos closed loop flight simulator
26
Fixation and tracking behavior
Poggio T and W Reichardt A Theory of Pattern Induced Flight Orientation of the Fly Musca Domestica Kybernetik 12 185-203 1972
27
Cognition in flies probabilistic theories then (coming only now to humans)
The beginning of untethered flight analysis Buumllthoff Poggio amp Wehrhahn Z Naturforsch 35c 811-815 (1980)
most behavioral fly research was done with the Goumltz torque meter
in 1976 based on this recording technology Reichardt amp Poggio developed their theory for Visual control of orientation behaviour in the fly Part I +II Quart Rev Biophysics 9(3) 311-375
open question how well does this theory describe fly behavior of natural flight
in 1980 Wehrhan started high-speed film recording of flies chasing each other
single frame analysis 3D stereo reconstruction
Cognitive theory of basic fly instincts predicts trajectory of chasing fly hellip
Wehrhahn C T Poggio and H Buumllthoff Biological Cybernetics 45 123-130 1982
30
Cognition in flies
Geiger G and T Poggio The Muller-Lyer Figure and the Fly Science 190 479-480 1975
Work at 3 levels
bull Fixation and tracking behavior of the fly (cognition in the flyhellipsimilar to Bayesian approach to cognition in humanshellipno neurons)
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits
bull Biophysics of computation
Motion algorithm the beetle Clorophanus and Reichardtrsquos motion detector
Motion algorithm the beetle and the fly
bull The beetle follows the motion
bull Each photoreceptor sees only an alternation of dark and light how is motion computed
bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector
bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz
bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex
bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)
Relative motion and figure-ground discrimination the fly
Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)
Motion discontinuities and figure-ground discrimination neural circuitry
Towards the neural circuitry Reichardt Poggio Hausen 1983
36
Relative motion
Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003
Two of the neuronshellip
Work at 3 levels
bull Fixation and tracking behavior of the fly
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)
bull Biophysics of computation
39
Biophysics of computation (motion detection)
Biophysics of Computation
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
copy Nature Publishing Group1985
_____________________________________ ____________
Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch
Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA
Istituto di Fisica Universita di Genova Genova Italy
Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain
COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision
Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction
A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems
Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only
Examples of early vision processes
bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour
generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing
Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process
A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems
Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6
one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions
The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the
Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ
ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ
oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ
cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ
$3)135 51052+5 4amp-5 5 (5
13
UacuteѱKŏUdԛ ԛ
ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ
CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ
oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
Cooperativeneuralnetworkforstereo
~ 1979 T Poggio and D Marr MPI Tuebingen
)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz
1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg
nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2
sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2
0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2
3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2
6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2
sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2
=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2
U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2
$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2
$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2
$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2
$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2
$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2
$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2
$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2
$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2
$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2
sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2
$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2
sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2
sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2
13
13
4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z
PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2
13
+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz
D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz
0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2
3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2
6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2
sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2
)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2
sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2
sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2
$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2
sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2
sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2
0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2
3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2
6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2
sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2
=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2
Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2
pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz
UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z
Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2
=sup2
Cooperative Computation of Stereo Disparity
D Marr T Poggio
Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287
Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1
Science is currently published by American Association for the Advancement of Science
Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use
Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission
JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg
httpwwwjstororgMon Jan 22 124953 2007
Vision A Computational Investigation into the Human Representation and Processing of Visual Information
Foreword by Afterword by Tomaso Poggio
David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists
In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level
Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui
Visionwhatiswhere
A complex system must be understood at several different levels
Werner Reichardtrsquos scientific legacy Integrative Neuroscience
bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels
mdash computation mdash algorithms mdash biophysics and circuits
bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip
bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip
bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik
MIT (1981-)
43rd Stated Meeting of the NRP Associates March 14-17 1982
Learning theory + algorithms
Computational Neuroscience
models+experiments
ENGINEERING APPLICATIONS
bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor
How visual cortex works ndash and how it may suggest better computer vision
systems
2
1
1min ( ( ))i i Kf H i
V y f x fmicroisin
=
⎡ ⎤+⎢ ⎥
⎣ ⎦sum
Predictive regularization algorithms
Theorems on foundations of learning
MIT (1981-)
BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001
ON THE MATHEMATICAL FOUNDATIONS OF LEARNING
FELIPE CUCKER AND STEVE SMALE
The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial
T Poggio and CR Shelton
Introduction
(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear
We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of
languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-
ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])
(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice
Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)
Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In
Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University
grant No 8780043
c2001 American Mathematical Society
1
General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2
1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA
Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering
One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label
In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data
In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses
What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-
ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the
algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate
Box 1Formal definitions in supervised learning
Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example
n1lim jXn 2Xj 0 in probability) if and only if for every e 0
n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z
S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as
I$f
zVf zdmz
which is also the expected error of a new sample z drawn from thedistribution In the case of square loss
I$f
XYfx2 y2dmxy
We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S
IS$f 1
n
X
n
i1
Vf zi
Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m
n1lim jI$fS2 IS$fSj 0 in probability
An algorithm is (universally) consistent if uniformly for any distributionm and any e 0
n1lim P I$fSf2Hinf I$famp 1
0
letters to nature
NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group
Why do hierarchical architectures work
bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern
Sung amp Poggio 1995
~15 year old CBCL computer vision research face detection
since 2006 on the market (digital cameras)
Third Annual NSF Site Visit June 8 ndash 9 2016
Moore-like law for ML (1995-2018)
bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses
Visionwhatiswhere
bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)
ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex
Van Essen amp Anderson 1990
The ventral stream hierarchy V1 V2 V4 IT
A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to
position and scale changes
Kobatake amp Tanaka 1994
Visionventralstream
74
Cognition in people
Shape representation in the inferior temporalcortex of monkeys
Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during
view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view
Current Biology 1995 5552-563
Background
Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object
Most theories which postulate that transformations of animage representation precede matching assume either a
complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set
Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer
Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu
Current Biology 1995 Vol 5 No 5552
9520 spring 2003
Modelrsquos early predictions neurons become view-tuned during recognition
Poggio Edelman Riesenhuber (1990 2000)
Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995
A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)
Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007
Database collected by Oliva amp Torralba
Psychophysics of rapid categorization
Rapid categorization task (with mask to test feedforward model)
Animal present or not
30 ms ISI
20 ms
Image
Interval Image-Mask
Mask 1f noise
80 ms
Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005
Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)
Hierarchical feedforward models of the ventral stream
Decoding the neural code Matrix-like read-out from the brain
Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale
Hung Kreiman Poggio DiCarlo 2005
Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005
helliphelliphellip in 2013helliphellip
DeepMind
International and Corporate Partners
IITCingolani
ASTARChuan Poh Lim
Hebrew UWeiss
MPIBuumllthoff
Genoa UVerri Rosasco
WeizmannUllman
Sangwan Lee
IBM HondaMicrosoft
Boston Dynamics
Orcam NVIDIASiemens
Schlumberger Mobileye Intel
Fujitsu
GE
Kaist
Videos - ~950 (May 2014 - April 2020)
(of Youtube subscribers only - 18 of viewers)
Ellen Hildreth
Mandana Sassanfar
Diversity Program
EAC- May 2020
Code Software and Datasets
Therersquos Waldo A Normalization Model of Visual Search Predicts Single-Trial Human Fixations in an Object Search Task Thomas Miconi Laura Groomes and Gabriel
Kreiman
Cerebral Cortex 2016
- See more at httpklabtchharvardeduresources
miconietal_visualsearch_2016htmlsthashKmHoBP
skxwHtrTkJdpuf
ObjectNet A new benchmark for object recognition (in prep) Andrei Barbu David Mayo Josh Tenenbaum Boris Katz
Existing object detection benchmarks overstate the performance of machines and understate the performance of humans We are creating a dataset that removes biases and shows that machines are far inferior to humans when detecting objects
Partially Occluded Hands B Myanganbayar C Mata G Dekel B Katz G Ben-Yosef A Barbu
A dataset of RGB images of hands holding objects and interacting with objects Measured human accuracy on reconstructing occluded portions of hands People are extremely good at this task while networks are at near chance-level performance
Summer Course at Woods Hole Our flagship initiative
Brains Minds amp Machines Summer Course An intensive three-week course gives advanced students a ldquodeeprdquo introduction to the problem of intelligence
A self-reproducing community of scholars is being formed ~gt300 applicants ~30 accepted
Sponsored fellowships by GoogleX Hidary Foundation + Fujitsu
Ellen Hildreth
Boris Katz
Gabriel Kreiman
Directors
Lizanne Distefano
Kenny Blum
Kathleen Sullivan
Kris Brewer
EAC May 2020
CBMM Summer Schoolbull Signature CBMM (EducationKnowledge Transfer)activity aimed at creating an intergenerationalcommunity around the scienceandengineeringofintelligence
bull Students reported strong influence of lecturesworkingonprojectsandinteractionsamongfacultyTArsquosandpeersontheirownthinkingandresearchdevelopment
understand how the brain works (then) make intelligent machines
WHY
Our vision and mission
Recent Success Stories in AI are based on RL and DL
DL and RL come from neuroscience
Minskyrsquos SNARC
RL
DL
We focus on the combination of neuroscience and engineering to make progress on the problem of intelligence because as in the recent past it is likely that several of the next breakthroughs in ML and AI are likely to come from neuroscienceANDengineering
Vision for the BMM SummerSchool
A quick recap of 40 of the last ~50 years of neuroscience and ML through my eyes
1972-2013
Tuebingen MPI fuer BK (1972-1981)
Werner Reichardtrsquos PhD
Werner with Dr Ruska (center) Photo dated Nov 17 1952 (courtesy B Reichardt)
The four directors of the MPI fuer Biologische Kybernetik
23
The beautiful eyes of flies
Work at 3 levels
bull Fixation and tracking behavior of the fly
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits
bull Biophysics of computation
Fixation and tracking behavior Reichardtrsquos closed loop flight simulator
26
Fixation and tracking behavior
Poggio T and W Reichardt A Theory of Pattern Induced Flight Orientation of the Fly Musca Domestica Kybernetik 12 185-203 1972
27
Cognition in flies probabilistic theories then (coming only now to humans)
The beginning of untethered flight analysis Buumllthoff Poggio amp Wehrhahn Z Naturforsch 35c 811-815 (1980)
most behavioral fly research was done with the Goumltz torque meter
in 1976 based on this recording technology Reichardt amp Poggio developed their theory for Visual control of orientation behaviour in the fly Part I +II Quart Rev Biophysics 9(3) 311-375
open question how well does this theory describe fly behavior of natural flight
in 1980 Wehrhan started high-speed film recording of flies chasing each other
single frame analysis 3D stereo reconstruction
Cognitive theory of basic fly instincts predicts trajectory of chasing fly hellip
Wehrhahn C T Poggio and H Buumllthoff Biological Cybernetics 45 123-130 1982
30
Cognition in flies
Geiger G and T Poggio The Muller-Lyer Figure and the Fly Science 190 479-480 1975
Work at 3 levels
bull Fixation and tracking behavior of the fly (cognition in the flyhellipsimilar to Bayesian approach to cognition in humanshellipno neurons)
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits
bull Biophysics of computation
Motion algorithm the beetle Clorophanus and Reichardtrsquos motion detector
Motion algorithm the beetle and the fly
bull The beetle follows the motion
bull Each photoreceptor sees only an alternation of dark and light how is motion computed
bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector
bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz
bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex
bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)
Relative motion and figure-ground discrimination the fly
Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)
Motion discontinuities and figure-ground discrimination neural circuitry
Towards the neural circuitry Reichardt Poggio Hausen 1983
36
Relative motion
Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003
Two of the neuronshellip
Work at 3 levels
bull Fixation and tracking behavior of the fly
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)
bull Biophysics of computation
39
Biophysics of computation (motion detection)
Biophysics of Computation
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
copy Nature Publishing Group1985
_____________________________________ ____________
Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch
Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA
Istituto di Fisica Universita di Genova Genova Italy
Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain
COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision
Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction
A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems
Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only
Examples of early vision processes
bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour
generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing
Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process
A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems
Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6
one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions
The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the
Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ
ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ
oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ
cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ
$3)135 51052+5 4amp-5 5 (5
13
UacuteѱKŏUdԛ ԛ
ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ
CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ
oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
Cooperativeneuralnetworkforstereo
~ 1979 T Poggio and D Marr MPI Tuebingen
)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz
1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg
nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2
sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2
0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2
3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2
6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2
sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2
=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2
U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2
$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2
$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2
$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2
$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2
$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2
$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2
$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2
$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2
$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2
sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2
$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2
sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2
sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2
13
13
4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z
PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2
13
+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz
D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz
0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2
3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2
6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2
sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2
)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2
sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2
sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2
$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2
sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2
sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2
0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2
3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2
6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2
sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2
=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2
Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2
pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz
UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z
Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2
=sup2
Cooperative Computation of Stereo Disparity
D Marr T Poggio
Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287
Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1
Science is currently published by American Association for the Advancement of Science
Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use
Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission
JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg
httpwwwjstororgMon Jan 22 124953 2007
Vision A Computational Investigation into the Human Representation and Processing of Visual Information
Foreword by Afterword by Tomaso Poggio
David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists
In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level
Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui
Visionwhatiswhere
A complex system must be understood at several different levels
Werner Reichardtrsquos scientific legacy Integrative Neuroscience
bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels
mdash computation mdash algorithms mdash biophysics and circuits
bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip
bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip
bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik
MIT (1981-)
43rd Stated Meeting of the NRP Associates March 14-17 1982
Learning theory + algorithms
Computational Neuroscience
models+experiments
ENGINEERING APPLICATIONS
bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor
How visual cortex works ndash and how it may suggest better computer vision
systems
2
1
1min ( ( ))i i Kf H i
V y f x fmicroisin
=
⎡ ⎤+⎢ ⎥
⎣ ⎦sum
Predictive regularization algorithms
Theorems on foundations of learning
MIT (1981-)
BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001
ON THE MATHEMATICAL FOUNDATIONS OF LEARNING
FELIPE CUCKER AND STEVE SMALE
The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial
T Poggio and CR Shelton
Introduction
(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear
We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of
languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-
ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])
(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice
Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)
Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In
Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University
grant No 8780043
c2001 American Mathematical Society
1
General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2
1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA
Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering
One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label
In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data
In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses
What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-
ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the
algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate
Box 1Formal definitions in supervised learning
Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example
n1lim jXn 2Xj 0 in probability) if and only if for every e 0
n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z
S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as
I$f
zVf zdmz
which is also the expected error of a new sample z drawn from thedistribution In the case of square loss
I$f
XYfx2 y2dmxy
We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S
IS$f 1
n
X
n
i1
Vf zi
Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m
n1lim jI$fS2 IS$fSj 0 in probability
An algorithm is (universally) consistent if uniformly for any distributionm and any e 0
n1lim P I$fSf2Hinf I$famp 1
0
letters to nature
NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group
Why do hierarchical architectures work
bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern
Sung amp Poggio 1995
~15 year old CBCL computer vision research face detection
since 2006 on the market (digital cameras)
Third Annual NSF Site Visit June 8 ndash 9 2016
Moore-like law for ML (1995-2018)
bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses
Visionwhatiswhere
bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)
ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex
Van Essen amp Anderson 1990
The ventral stream hierarchy V1 V2 V4 IT
A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to
position and scale changes
Kobatake amp Tanaka 1994
Visionventralstream
74
Cognition in people
Shape representation in the inferior temporalcortex of monkeys
Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during
view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view
Current Biology 1995 5552-563
Background
Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object
Most theories which postulate that transformations of animage representation precede matching assume either a
complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set
Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer
Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu
Current Biology 1995 Vol 5 No 5552
9520 spring 2003
Modelrsquos early predictions neurons become view-tuned during recognition
Poggio Edelman Riesenhuber (1990 2000)
Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995
A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)
Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007
Database collected by Oliva amp Torralba
Psychophysics of rapid categorization
Rapid categorization task (with mask to test feedforward model)
Animal present or not
30 ms ISI
20 ms
Image
Interval Image-Mask
Mask 1f noise
80 ms
Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005
Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)
Hierarchical feedforward models of the ventral stream
Decoding the neural code Matrix-like read-out from the brain
Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale
Hung Kreiman Poggio DiCarlo 2005
Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005
helliphelliphellip in 2013helliphellip
Videos - ~950 (May 2014 - April 2020)
(of Youtube subscribers only - 18 of viewers)
Ellen Hildreth
Mandana Sassanfar
Diversity Program
EAC- May 2020
Code Software and Datasets
Therersquos Waldo A Normalization Model of Visual Search Predicts Single-Trial Human Fixations in an Object Search Task Thomas Miconi Laura Groomes and Gabriel
Kreiman
Cerebral Cortex 2016
- See more at httpklabtchharvardeduresources
miconietal_visualsearch_2016htmlsthashKmHoBP
skxwHtrTkJdpuf
ObjectNet A new benchmark for object recognition (in prep) Andrei Barbu David Mayo Josh Tenenbaum Boris Katz
Existing object detection benchmarks overstate the performance of machines and understate the performance of humans We are creating a dataset that removes biases and shows that machines are far inferior to humans when detecting objects
Partially Occluded Hands B Myanganbayar C Mata G Dekel B Katz G Ben-Yosef A Barbu
A dataset of RGB images of hands holding objects and interacting with objects Measured human accuracy on reconstructing occluded portions of hands People are extremely good at this task while networks are at near chance-level performance
Summer Course at Woods Hole Our flagship initiative
Brains Minds amp Machines Summer Course An intensive three-week course gives advanced students a ldquodeeprdquo introduction to the problem of intelligence
A self-reproducing community of scholars is being formed ~gt300 applicants ~30 accepted
Sponsored fellowships by GoogleX Hidary Foundation + Fujitsu
Ellen Hildreth
Boris Katz
Gabriel Kreiman
Directors
Lizanne Distefano
Kenny Blum
Kathleen Sullivan
Kris Brewer
EAC May 2020
CBMM Summer Schoolbull Signature CBMM (EducationKnowledge Transfer)activity aimed at creating an intergenerationalcommunity around the scienceandengineeringofintelligence
bull Students reported strong influence of lecturesworkingonprojectsandinteractionsamongfacultyTArsquosandpeersontheirownthinkingandresearchdevelopment
understand how the brain works (then) make intelligent machines
WHY
Our vision and mission
Recent Success Stories in AI are based on RL and DL
DL and RL come from neuroscience
Minskyrsquos SNARC
RL
DL
We focus on the combination of neuroscience and engineering to make progress on the problem of intelligence because as in the recent past it is likely that several of the next breakthroughs in ML and AI are likely to come from neuroscienceANDengineering
Vision for the BMM SummerSchool
A quick recap of 40 of the last ~50 years of neuroscience and ML through my eyes
1972-2013
Tuebingen MPI fuer BK (1972-1981)
Werner Reichardtrsquos PhD
Werner with Dr Ruska (center) Photo dated Nov 17 1952 (courtesy B Reichardt)
The four directors of the MPI fuer Biologische Kybernetik
23
The beautiful eyes of flies
Work at 3 levels
bull Fixation and tracking behavior of the fly
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits
bull Biophysics of computation
Fixation and tracking behavior Reichardtrsquos closed loop flight simulator
26
Fixation and tracking behavior
Poggio T and W Reichardt A Theory of Pattern Induced Flight Orientation of the Fly Musca Domestica Kybernetik 12 185-203 1972
27
Cognition in flies probabilistic theories then (coming only now to humans)
The beginning of untethered flight analysis Buumllthoff Poggio amp Wehrhahn Z Naturforsch 35c 811-815 (1980)
most behavioral fly research was done with the Goumltz torque meter
in 1976 based on this recording technology Reichardt amp Poggio developed their theory for Visual control of orientation behaviour in the fly Part I +II Quart Rev Biophysics 9(3) 311-375
open question how well does this theory describe fly behavior of natural flight
in 1980 Wehrhan started high-speed film recording of flies chasing each other
single frame analysis 3D stereo reconstruction
Cognitive theory of basic fly instincts predicts trajectory of chasing fly hellip
Wehrhahn C T Poggio and H Buumllthoff Biological Cybernetics 45 123-130 1982
30
Cognition in flies
Geiger G and T Poggio The Muller-Lyer Figure and the Fly Science 190 479-480 1975
Work at 3 levels
bull Fixation and tracking behavior of the fly (cognition in the flyhellipsimilar to Bayesian approach to cognition in humanshellipno neurons)
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits
bull Biophysics of computation
Motion algorithm the beetle Clorophanus and Reichardtrsquos motion detector
Motion algorithm the beetle and the fly
bull The beetle follows the motion
bull Each photoreceptor sees only an alternation of dark and light how is motion computed
bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector
bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz
bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex
bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)
Relative motion and figure-ground discrimination the fly
Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)
Motion discontinuities and figure-ground discrimination neural circuitry
Towards the neural circuitry Reichardt Poggio Hausen 1983
36
Relative motion
Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003
Two of the neuronshellip
Work at 3 levels
bull Fixation and tracking behavior of the fly
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)
bull Biophysics of computation
39
Biophysics of computation (motion detection)
Biophysics of Computation
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
copy Nature Publishing Group1985
_____________________________________ ____________
Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch
Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA
Istituto di Fisica Universita di Genova Genova Italy
Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain
COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision
Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction
A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems
Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only
Examples of early vision processes
bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour
generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing
Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process
A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems
Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6
one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions
The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the
Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ
ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ
oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ
cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ
$3)135 51052+5 4amp-5 5 (5
13
UacuteѱKŏUdԛ ԛ
ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ
CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ
oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
Cooperativeneuralnetworkforstereo
~ 1979 T Poggio and D Marr MPI Tuebingen
)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz
1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg
nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2
sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2
0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2
3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2
6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2
sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2
=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2
U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2
$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2
$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2
$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2
$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2
$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2
$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2
$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2
$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2
$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2
sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2
$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2
sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2
sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2
13
13
4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z
PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2
13
+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz
D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz
0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2
3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2
6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2
sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2
)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2
sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2
sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2
$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2
sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2
sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2
0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2
3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2
6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2
sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2
=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2
Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2
pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz
UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z
Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2
=sup2
Cooperative Computation of Stereo Disparity
D Marr T Poggio
Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287
Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1
Science is currently published by American Association for the Advancement of Science
Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use
Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission
JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg
httpwwwjstororgMon Jan 22 124953 2007
Vision A Computational Investigation into the Human Representation and Processing of Visual Information
Foreword by Afterword by Tomaso Poggio
David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists
In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level
Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui
Visionwhatiswhere
A complex system must be understood at several different levels
Werner Reichardtrsquos scientific legacy Integrative Neuroscience
bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels
mdash computation mdash algorithms mdash biophysics and circuits
bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip
bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip
bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik
MIT (1981-)
43rd Stated Meeting of the NRP Associates March 14-17 1982
Learning theory + algorithms
Computational Neuroscience
models+experiments
ENGINEERING APPLICATIONS
bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor
How visual cortex works ndash and how it may suggest better computer vision
systems
2
1
1min ( ( ))i i Kf H i
V y f x fmicroisin
=
⎡ ⎤+⎢ ⎥
⎣ ⎦sum
Predictive regularization algorithms
Theorems on foundations of learning
MIT (1981-)
BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001
ON THE MATHEMATICAL FOUNDATIONS OF LEARNING
FELIPE CUCKER AND STEVE SMALE
The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial
T Poggio and CR Shelton
Introduction
(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear
We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of
languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-
ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])
(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice
Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)
Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In
Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University
grant No 8780043
c2001 American Mathematical Society
1
General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2
1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA
Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering
One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label
In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data
In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses
What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-
ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the
algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate
Box 1Formal definitions in supervised learning
Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example
n1lim jXn 2Xj 0 in probability) if and only if for every e 0
n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z
S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as
I$f
zVf zdmz
which is also the expected error of a new sample z drawn from thedistribution In the case of square loss
I$f
XYfx2 y2dmxy
We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S
IS$f 1
n
X
n
i1
Vf zi
Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m
n1lim jI$fS2 IS$fSj 0 in probability
An algorithm is (universally) consistent if uniformly for any distributionm and any e 0
n1lim P I$fSf2Hinf I$famp 1
0
letters to nature
NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group
Why do hierarchical architectures work
bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern
Sung amp Poggio 1995
~15 year old CBCL computer vision research face detection
since 2006 on the market (digital cameras)
Third Annual NSF Site Visit June 8 ndash 9 2016
Moore-like law for ML (1995-2018)
bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses
Visionwhatiswhere
bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)
ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex
Van Essen amp Anderson 1990
The ventral stream hierarchy V1 V2 V4 IT
A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to
position and scale changes
Kobatake amp Tanaka 1994
Visionventralstream
74
Cognition in people
Shape representation in the inferior temporalcortex of monkeys
Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during
view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view
Current Biology 1995 5552-563
Background
Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object
Most theories which postulate that transformations of animage representation precede matching assume either a
complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set
Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer
Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu
Current Biology 1995 Vol 5 No 5552
9520 spring 2003
Modelrsquos early predictions neurons become view-tuned during recognition
Poggio Edelman Riesenhuber (1990 2000)
Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995
A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)
Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007
Database collected by Oliva amp Torralba
Psychophysics of rapid categorization
Rapid categorization task (with mask to test feedforward model)
Animal present or not
30 ms ISI
20 ms
Image
Interval Image-Mask
Mask 1f noise
80 ms
Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005
Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)
Hierarchical feedforward models of the ventral stream
Decoding the neural code Matrix-like read-out from the brain
Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale
Hung Kreiman Poggio DiCarlo 2005
Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005
helliphelliphellip in 2013helliphellip
Ellen Hildreth
Mandana Sassanfar
Diversity Program
EAC- May 2020
Code Software and Datasets
Therersquos Waldo A Normalization Model of Visual Search Predicts Single-Trial Human Fixations in an Object Search Task Thomas Miconi Laura Groomes and Gabriel
Kreiman
Cerebral Cortex 2016
- See more at httpklabtchharvardeduresources
miconietal_visualsearch_2016htmlsthashKmHoBP
skxwHtrTkJdpuf
ObjectNet A new benchmark for object recognition (in prep) Andrei Barbu David Mayo Josh Tenenbaum Boris Katz
Existing object detection benchmarks overstate the performance of machines and understate the performance of humans We are creating a dataset that removes biases and shows that machines are far inferior to humans when detecting objects
Partially Occluded Hands B Myanganbayar C Mata G Dekel B Katz G Ben-Yosef A Barbu
A dataset of RGB images of hands holding objects and interacting with objects Measured human accuracy on reconstructing occluded portions of hands People are extremely good at this task while networks are at near chance-level performance
Summer Course at Woods Hole Our flagship initiative
Brains Minds amp Machines Summer Course An intensive three-week course gives advanced students a ldquodeeprdquo introduction to the problem of intelligence
A self-reproducing community of scholars is being formed ~gt300 applicants ~30 accepted
Sponsored fellowships by GoogleX Hidary Foundation + Fujitsu
Ellen Hildreth
Boris Katz
Gabriel Kreiman
Directors
Lizanne Distefano
Kenny Blum
Kathleen Sullivan
Kris Brewer
EAC May 2020
CBMM Summer Schoolbull Signature CBMM (EducationKnowledge Transfer)activity aimed at creating an intergenerationalcommunity around the scienceandengineeringofintelligence
bull Students reported strong influence of lecturesworkingonprojectsandinteractionsamongfacultyTArsquosandpeersontheirownthinkingandresearchdevelopment
understand how the brain works (then) make intelligent machines
WHY
Our vision and mission
Recent Success Stories in AI are based on RL and DL
DL and RL come from neuroscience
Minskyrsquos SNARC
RL
DL
We focus on the combination of neuroscience and engineering to make progress on the problem of intelligence because as in the recent past it is likely that several of the next breakthroughs in ML and AI are likely to come from neuroscienceANDengineering
Vision for the BMM SummerSchool
A quick recap of 40 of the last ~50 years of neuroscience and ML through my eyes
1972-2013
Tuebingen MPI fuer BK (1972-1981)
Werner Reichardtrsquos PhD
Werner with Dr Ruska (center) Photo dated Nov 17 1952 (courtesy B Reichardt)
The four directors of the MPI fuer Biologische Kybernetik
23
The beautiful eyes of flies
Work at 3 levels
bull Fixation and tracking behavior of the fly
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits
bull Biophysics of computation
Fixation and tracking behavior Reichardtrsquos closed loop flight simulator
26
Fixation and tracking behavior
Poggio T and W Reichardt A Theory of Pattern Induced Flight Orientation of the Fly Musca Domestica Kybernetik 12 185-203 1972
27
Cognition in flies probabilistic theories then (coming only now to humans)
The beginning of untethered flight analysis Buumllthoff Poggio amp Wehrhahn Z Naturforsch 35c 811-815 (1980)
most behavioral fly research was done with the Goumltz torque meter
in 1976 based on this recording technology Reichardt amp Poggio developed their theory for Visual control of orientation behaviour in the fly Part I +II Quart Rev Biophysics 9(3) 311-375
open question how well does this theory describe fly behavior of natural flight
in 1980 Wehrhan started high-speed film recording of flies chasing each other
single frame analysis 3D stereo reconstruction
Cognitive theory of basic fly instincts predicts trajectory of chasing fly hellip
Wehrhahn C T Poggio and H Buumllthoff Biological Cybernetics 45 123-130 1982
30
Cognition in flies
Geiger G and T Poggio The Muller-Lyer Figure and the Fly Science 190 479-480 1975
Work at 3 levels
bull Fixation and tracking behavior of the fly (cognition in the flyhellipsimilar to Bayesian approach to cognition in humanshellipno neurons)
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits
bull Biophysics of computation
Motion algorithm the beetle Clorophanus and Reichardtrsquos motion detector
Motion algorithm the beetle and the fly
bull The beetle follows the motion
bull Each photoreceptor sees only an alternation of dark and light how is motion computed
bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector
bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz
bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex
bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)
Relative motion and figure-ground discrimination the fly
Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)
Motion discontinuities and figure-ground discrimination neural circuitry
Towards the neural circuitry Reichardt Poggio Hausen 1983
36
Relative motion
Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003
Two of the neuronshellip
Work at 3 levels
bull Fixation and tracking behavior of the fly
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)
bull Biophysics of computation
39
Biophysics of computation (motion detection)
Biophysics of Computation
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
copy Nature Publishing Group1985
_____________________________________ ____________
Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch
Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA
Istituto di Fisica Universita di Genova Genova Italy
Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain
COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision
Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction
A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems
Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only
Examples of early vision processes
bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour
generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing
Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process
A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems
Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6
one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions
The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the
Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ
ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ
oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ
cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ
$3)135 51052+5 4amp-5 5 (5
13
UacuteѱKŏUdԛ ԛ
ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ
CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ
oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
Cooperativeneuralnetworkforstereo
~ 1979 T Poggio and D Marr MPI Tuebingen
)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz
1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg
nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2
sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2
0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2
3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2
6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2
sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2
=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2
U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2
$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2
$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2
$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2
$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2
$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2
$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2
$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2
$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2
$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2
sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2
$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2
sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2
sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2
13
13
4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z
PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2
13
+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz
D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz
0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2
3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2
6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2
sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2
)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2
sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2
sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2
$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2
sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2
sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2
0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2
3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2
6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2
sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2
=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2
Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2
pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz
UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z
Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2
=sup2
Cooperative Computation of Stereo Disparity
D Marr T Poggio
Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287
Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1
Science is currently published by American Association for the Advancement of Science
Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use
Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission
JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg
httpwwwjstororgMon Jan 22 124953 2007
Vision A Computational Investigation into the Human Representation and Processing of Visual Information
Foreword by Afterword by Tomaso Poggio
David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists
In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level
Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui
Visionwhatiswhere
A complex system must be understood at several different levels
Werner Reichardtrsquos scientific legacy Integrative Neuroscience
bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels
mdash computation mdash algorithms mdash biophysics and circuits
bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip
bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip
bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik
MIT (1981-)
43rd Stated Meeting of the NRP Associates March 14-17 1982
Learning theory + algorithms
Computational Neuroscience
models+experiments
ENGINEERING APPLICATIONS
bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor
How visual cortex works ndash and how it may suggest better computer vision
systems
2
1
1min ( ( ))i i Kf H i
V y f x fmicroisin
=
⎡ ⎤+⎢ ⎥
⎣ ⎦sum
Predictive regularization algorithms
Theorems on foundations of learning
MIT (1981-)
BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001
ON THE MATHEMATICAL FOUNDATIONS OF LEARNING
FELIPE CUCKER AND STEVE SMALE
The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial
T Poggio and CR Shelton
Introduction
(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear
We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of
languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-
ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])
(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice
Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)
Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In
Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University
grant No 8780043
c2001 American Mathematical Society
1
General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2
1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA
Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering
One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label
In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data
In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses
What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-
ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the
algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate
Box 1Formal definitions in supervised learning
Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example
n1lim jXn 2Xj 0 in probability) if and only if for every e 0
n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z
S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as
I$f
zVf zdmz
which is also the expected error of a new sample z drawn from thedistribution In the case of square loss
I$f
XYfx2 y2dmxy
We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S
IS$f 1
n
X
n
i1
Vf zi
Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m
n1lim jI$fS2 IS$fSj 0 in probability
An algorithm is (universally) consistent if uniformly for any distributionm and any e 0
n1lim P I$fSf2Hinf I$famp 1
0
letters to nature
NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group
Why do hierarchical architectures work
bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern
Sung amp Poggio 1995
~15 year old CBCL computer vision research face detection
since 2006 on the market (digital cameras)
Third Annual NSF Site Visit June 8 ndash 9 2016
Moore-like law for ML (1995-2018)
bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses
Visionwhatiswhere
bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)
ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex
Van Essen amp Anderson 1990
The ventral stream hierarchy V1 V2 V4 IT
A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to
position and scale changes
Kobatake amp Tanaka 1994
Visionventralstream
74
Cognition in people
Shape representation in the inferior temporalcortex of monkeys
Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during
view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view
Current Biology 1995 5552-563
Background
Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object
Most theories which postulate that transformations of animage representation precede matching assume either a
complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set
Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer
Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu
Current Biology 1995 Vol 5 No 5552
9520 spring 2003
Modelrsquos early predictions neurons become view-tuned during recognition
Poggio Edelman Riesenhuber (1990 2000)
Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995
A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)
Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007
Database collected by Oliva amp Torralba
Psychophysics of rapid categorization
Rapid categorization task (with mask to test feedforward model)
Animal present or not
30 ms ISI
20 ms
Image
Interval Image-Mask
Mask 1f noise
80 ms
Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005
Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)
Hierarchical feedforward models of the ventral stream
Decoding the neural code Matrix-like read-out from the brain
Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale
Hung Kreiman Poggio DiCarlo 2005
Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005
helliphelliphellip in 2013helliphellip
EAC- May 2020
Code Software and Datasets
Therersquos Waldo A Normalization Model of Visual Search Predicts Single-Trial Human Fixations in an Object Search Task Thomas Miconi Laura Groomes and Gabriel
Kreiman
Cerebral Cortex 2016
- See more at httpklabtchharvardeduresources
miconietal_visualsearch_2016htmlsthashKmHoBP
skxwHtrTkJdpuf
ObjectNet A new benchmark for object recognition (in prep) Andrei Barbu David Mayo Josh Tenenbaum Boris Katz
Existing object detection benchmarks overstate the performance of machines and understate the performance of humans We are creating a dataset that removes biases and shows that machines are far inferior to humans when detecting objects
Partially Occluded Hands B Myanganbayar C Mata G Dekel B Katz G Ben-Yosef A Barbu
A dataset of RGB images of hands holding objects and interacting with objects Measured human accuracy on reconstructing occluded portions of hands People are extremely good at this task while networks are at near chance-level performance
Summer Course at Woods Hole Our flagship initiative
Brains Minds amp Machines Summer Course An intensive three-week course gives advanced students a ldquodeeprdquo introduction to the problem of intelligence
A self-reproducing community of scholars is being formed ~gt300 applicants ~30 accepted
Sponsored fellowships by GoogleX Hidary Foundation + Fujitsu
Ellen Hildreth
Boris Katz
Gabriel Kreiman
Directors
Lizanne Distefano
Kenny Blum
Kathleen Sullivan
Kris Brewer
EAC May 2020
CBMM Summer Schoolbull Signature CBMM (EducationKnowledge Transfer)activity aimed at creating an intergenerationalcommunity around the scienceandengineeringofintelligence
bull Students reported strong influence of lecturesworkingonprojectsandinteractionsamongfacultyTArsquosandpeersontheirownthinkingandresearchdevelopment
understand how the brain works (then) make intelligent machines
WHY
Our vision and mission
Recent Success Stories in AI are based on RL and DL
DL and RL come from neuroscience
Minskyrsquos SNARC
RL
DL
We focus on the combination of neuroscience and engineering to make progress on the problem of intelligence because as in the recent past it is likely that several of the next breakthroughs in ML and AI are likely to come from neuroscienceANDengineering
Vision for the BMM SummerSchool
A quick recap of 40 of the last ~50 years of neuroscience and ML through my eyes
1972-2013
Tuebingen MPI fuer BK (1972-1981)
Werner Reichardtrsquos PhD
Werner with Dr Ruska (center) Photo dated Nov 17 1952 (courtesy B Reichardt)
The four directors of the MPI fuer Biologische Kybernetik
23
The beautiful eyes of flies
Work at 3 levels
bull Fixation and tracking behavior of the fly
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits
bull Biophysics of computation
Fixation and tracking behavior Reichardtrsquos closed loop flight simulator
26
Fixation and tracking behavior
Poggio T and W Reichardt A Theory of Pattern Induced Flight Orientation of the Fly Musca Domestica Kybernetik 12 185-203 1972
27
Cognition in flies probabilistic theories then (coming only now to humans)
The beginning of untethered flight analysis Buumllthoff Poggio amp Wehrhahn Z Naturforsch 35c 811-815 (1980)
most behavioral fly research was done with the Goumltz torque meter
in 1976 based on this recording technology Reichardt amp Poggio developed their theory for Visual control of orientation behaviour in the fly Part I +II Quart Rev Biophysics 9(3) 311-375
open question how well does this theory describe fly behavior of natural flight
in 1980 Wehrhan started high-speed film recording of flies chasing each other
single frame analysis 3D stereo reconstruction
Cognitive theory of basic fly instincts predicts trajectory of chasing fly hellip
Wehrhahn C T Poggio and H Buumllthoff Biological Cybernetics 45 123-130 1982
30
Cognition in flies
Geiger G and T Poggio The Muller-Lyer Figure and the Fly Science 190 479-480 1975
Work at 3 levels
bull Fixation and tracking behavior of the fly (cognition in the flyhellipsimilar to Bayesian approach to cognition in humanshellipno neurons)
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits
bull Biophysics of computation
Motion algorithm the beetle Clorophanus and Reichardtrsquos motion detector
Motion algorithm the beetle and the fly
bull The beetle follows the motion
bull Each photoreceptor sees only an alternation of dark and light how is motion computed
bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector
bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz
bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex
bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)
Relative motion and figure-ground discrimination the fly
Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)
Motion discontinuities and figure-ground discrimination neural circuitry
Towards the neural circuitry Reichardt Poggio Hausen 1983
36
Relative motion
Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003
Two of the neuronshellip
Work at 3 levels
bull Fixation and tracking behavior of the fly
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)
bull Biophysics of computation
39
Biophysics of computation (motion detection)
Biophysics of Computation
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
copy Nature Publishing Group1985
_____________________________________ ____________
Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch
Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA
Istituto di Fisica Universita di Genova Genova Italy
Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain
COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision
Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction
A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems
Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only
Examples of early vision processes
bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour
generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing
Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process
A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems
Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6
one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions
The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the
Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ
ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ
oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ
cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ
$3)135 51052+5 4amp-5 5 (5
13
UacuteѱKŏUdԛ ԛ
ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ
CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ
oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
Cooperativeneuralnetworkforstereo
~ 1979 T Poggio and D Marr MPI Tuebingen
)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz
1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg
nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2
sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2
0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2
3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2
6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2
sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2
=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2
U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2
$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2
$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2
$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2
$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2
$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2
$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2
$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2
$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2
$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2
sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2
$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2
sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2
sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2
13
13
4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z
PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2
13
+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz
D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz
0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2
3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2
6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2
sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2
)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2
sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2
sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2
$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2
sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2
sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2
0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2
3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2
6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2
sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2
=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2
Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2
pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz
UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z
Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2
=sup2
Cooperative Computation of Stereo Disparity
D Marr T Poggio
Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287
Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1
Science is currently published by American Association for the Advancement of Science
Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use
Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission
JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg
httpwwwjstororgMon Jan 22 124953 2007
Vision A Computational Investigation into the Human Representation and Processing of Visual Information
Foreword by Afterword by Tomaso Poggio
David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists
In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level
Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui
Visionwhatiswhere
A complex system must be understood at several different levels
Werner Reichardtrsquos scientific legacy Integrative Neuroscience
bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels
mdash computation mdash algorithms mdash biophysics and circuits
bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip
bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip
bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik
MIT (1981-)
43rd Stated Meeting of the NRP Associates March 14-17 1982
Learning theory + algorithms
Computational Neuroscience
models+experiments
ENGINEERING APPLICATIONS
bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor
How visual cortex works ndash and how it may suggest better computer vision
systems
2
1
1min ( ( ))i i Kf H i
V y f x fmicroisin
=
⎡ ⎤+⎢ ⎥
⎣ ⎦sum
Predictive regularization algorithms
Theorems on foundations of learning
MIT (1981-)
BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001
ON THE MATHEMATICAL FOUNDATIONS OF LEARNING
FELIPE CUCKER AND STEVE SMALE
The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial
T Poggio and CR Shelton
Introduction
(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear
We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of
languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-
ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])
(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice
Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)
Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In
Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University
grant No 8780043
c2001 American Mathematical Society
1
General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2
1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA
Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering
One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label
In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data
In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses
What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-
ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the
algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate
Box 1Formal definitions in supervised learning
Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example
n1lim jXn 2Xj 0 in probability) if and only if for every e 0
n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z
S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as
I$f
zVf zdmz
which is also the expected error of a new sample z drawn from thedistribution In the case of square loss
I$f
XYfx2 y2dmxy
We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S
IS$f 1
n
X
n
i1
Vf zi
Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m
n1lim jI$fS2 IS$fSj 0 in probability
An algorithm is (universally) consistent if uniformly for any distributionm and any e 0
n1lim P I$fSf2Hinf I$famp 1
0
letters to nature
NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group
Why do hierarchical architectures work
bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern
Sung amp Poggio 1995
~15 year old CBCL computer vision research face detection
since 2006 on the market (digital cameras)
Third Annual NSF Site Visit June 8 ndash 9 2016
Moore-like law for ML (1995-2018)
bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses
Visionwhatiswhere
bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)
ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex
Van Essen amp Anderson 1990
The ventral stream hierarchy V1 V2 V4 IT
A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to
position and scale changes
Kobatake amp Tanaka 1994
Visionventralstream
74
Cognition in people
Shape representation in the inferior temporalcortex of monkeys
Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during
view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view
Current Biology 1995 5552-563
Background
Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object
Most theories which postulate that transformations of animage representation precede matching assume either a
complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set
Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer
Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu
Current Biology 1995 Vol 5 No 5552
9520 spring 2003
Modelrsquos early predictions neurons become view-tuned during recognition
Poggio Edelman Riesenhuber (1990 2000)
Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995
A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)
Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007
Database collected by Oliva amp Torralba
Psychophysics of rapid categorization
Rapid categorization task (with mask to test feedforward model)
Animal present or not
30 ms ISI
20 ms
Image
Interval Image-Mask
Mask 1f noise
80 ms
Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005
Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)
Hierarchical feedforward models of the ventral stream
Decoding the neural code Matrix-like read-out from the brain
Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale
Hung Kreiman Poggio DiCarlo 2005
Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005
helliphelliphellip in 2013helliphellip
Summer Course at Woods Hole Our flagship initiative
Brains Minds amp Machines Summer Course An intensive three-week course gives advanced students a ldquodeeprdquo introduction to the problem of intelligence
A self-reproducing community of scholars is being formed ~gt300 applicants ~30 accepted
Sponsored fellowships by GoogleX Hidary Foundation + Fujitsu
Ellen Hildreth
Boris Katz
Gabriel Kreiman
Directors
Lizanne Distefano
Kenny Blum
Kathleen Sullivan
Kris Brewer
EAC May 2020
CBMM Summer Schoolbull Signature CBMM (EducationKnowledge Transfer)activity aimed at creating an intergenerationalcommunity around the scienceandengineeringofintelligence
bull Students reported strong influence of lecturesworkingonprojectsandinteractionsamongfacultyTArsquosandpeersontheirownthinkingandresearchdevelopment
understand how the brain works (then) make intelligent machines
WHY
Our vision and mission
Recent Success Stories in AI are based on RL and DL
DL and RL come from neuroscience
Minskyrsquos SNARC
RL
DL
We focus on the combination of neuroscience and engineering to make progress on the problem of intelligence because as in the recent past it is likely that several of the next breakthroughs in ML and AI are likely to come from neuroscienceANDengineering
Vision for the BMM SummerSchool
A quick recap of 40 of the last ~50 years of neuroscience and ML through my eyes
1972-2013
Tuebingen MPI fuer BK (1972-1981)
Werner Reichardtrsquos PhD
Werner with Dr Ruska (center) Photo dated Nov 17 1952 (courtesy B Reichardt)
The four directors of the MPI fuer Biologische Kybernetik
23
The beautiful eyes of flies
Work at 3 levels
bull Fixation and tracking behavior of the fly
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits
bull Biophysics of computation
Fixation and tracking behavior Reichardtrsquos closed loop flight simulator
26
Fixation and tracking behavior
Poggio T and W Reichardt A Theory of Pattern Induced Flight Orientation of the Fly Musca Domestica Kybernetik 12 185-203 1972
27
Cognition in flies probabilistic theories then (coming only now to humans)
The beginning of untethered flight analysis Buumllthoff Poggio amp Wehrhahn Z Naturforsch 35c 811-815 (1980)
most behavioral fly research was done with the Goumltz torque meter
in 1976 based on this recording technology Reichardt amp Poggio developed their theory for Visual control of orientation behaviour in the fly Part I +II Quart Rev Biophysics 9(3) 311-375
open question how well does this theory describe fly behavior of natural flight
in 1980 Wehrhan started high-speed film recording of flies chasing each other
single frame analysis 3D stereo reconstruction
Cognitive theory of basic fly instincts predicts trajectory of chasing fly hellip
Wehrhahn C T Poggio and H Buumllthoff Biological Cybernetics 45 123-130 1982
30
Cognition in flies
Geiger G and T Poggio The Muller-Lyer Figure and the Fly Science 190 479-480 1975
Work at 3 levels
bull Fixation and tracking behavior of the fly (cognition in the flyhellipsimilar to Bayesian approach to cognition in humanshellipno neurons)
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits
bull Biophysics of computation
Motion algorithm the beetle Clorophanus and Reichardtrsquos motion detector
Motion algorithm the beetle and the fly
bull The beetle follows the motion
bull Each photoreceptor sees only an alternation of dark and light how is motion computed
bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector
bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz
bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex
bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)
Relative motion and figure-ground discrimination the fly
Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)
Motion discontinuities and figure-ground discrimination neural circuitry
Towards the neural circuitry Reichardt Poggio Hausen 1983
36
Relative motion
Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003
Two of the neuronshellip
Work at 3 levels
bull Fixation and tracking behavior of the fly
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)
bull Biophysics of computation
39
Biophysics of computation (motion detection)
Biophysics of Computation
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
copy Nature Publishing Group1985
_____________________________________ ____________
Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch
Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA
Istituto di Fisica Universita di Genova Genova Italy
Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain
COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision
Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction
A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems
Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only
Examples of early vision processes
bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour
generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing
Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process
A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems
Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6
one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions
The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the
Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ
ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ
oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ
cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ
$3)135 51052+5 4amp-5 5 (5
13
UacuteѱKŏUdԛ ԛ
ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ
CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ
oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
Cooperativeneuralnetworkforstereo
~ 1979 T Poggio and D Marr MPI Tuebingen
)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz
1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg
nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2
sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2
0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2
3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2
6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2
sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2
=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2
U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2
$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2
$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2
$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2
$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2
$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2
$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2
$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2
$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2
$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2
sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2
$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2
sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2
sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2
13
13
4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z
PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2
13
+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz
D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz
0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2
3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2
6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2
sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2
)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2
sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2
sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2
$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2
sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2
sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2
0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2
3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2
6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2
sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2
=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2
Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2
pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz
UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z
Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2
=sup2
Cooperative Computation of Stereo Disparity
D Marr T Poggio
Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287
Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1
Science is currently published by American Association for the Advancement of Science
Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use
Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission
JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg
httpwwwjstororgMon Jan 22 124953 2007
Vision A Computational Investigation into the Human Representation and Processing of Visual Information
Foreword by Afterword by Tomaso Poggio
David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists
In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level
Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui
Visionwhatiswhere
A complex system must be understood at several different levels
Werner Reichardtrsquos scientific legacy Integrative Neuroscience
bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels
mdash computation mdash algorithms mdash biophysics and circuits
bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip
bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip
bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik
MIT (1981-)
43rd Stated Meeting of the NRP Associates March 14-17 1982
Learning theory + algorithms
Computational Neuroscience
models+experiments
ENGINEERING APPLICATIONS
bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor
How visual cortex works ndash and how it may suggest better computer vision
systems
2
1
1min ( ( ))i i Kf H i
V y f x fmicroisin
=
⎡ ⎤+⎢ ⎥
⎣ ⎦sum
Predictive regularization algorithms
Theorems on foundations of learning
MIT (1981-)
BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001
ON THE MATHEMATICAL FOUNDATIONS OF LEARNING
FELIPE CUCKER AND STEVE SMALE
The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial
T Poggio and CR Shelton
Introduction
(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear
We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of
languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-
ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])
(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice
Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)
Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In
Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University
grant No 8780043
c2001 American Mathematical Society
1
General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2
1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA
Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering
One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label
In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data
In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses
What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-
ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the
algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate
Box 1Formal definitions in supervised learning
Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example
n1lim jXn 2Xj 0 in probability) if and only if for every e 0
n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z
S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as
I$f
zVf zdmz
which is also the expected error of a new sample z drawn from thedistribution In the case of square loss
I$f
XYfx2 y2dmxy
We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S
IS$f 1
n
X
n
i1
Vf zi
Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m
n1lim jI$fS2 IS$fSj 0 in probability
An algorithm is (universally) consistent if uniformly for any distributionm and any e 0
n1lim P I$fSf2Hinf I$famp 1
0
letters to nature
NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group
Why do hierarchical architectures work
bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern
Sung amp Poggio 1995
~15 year old CBCL computer vision research face detection
since 2006 on the market (digital cameras)
Third Annual NSF Site Visit June 8 ndash 9 2016
Moore-like law for ML (1995-2018)
bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses
Visionwhatiswhere
bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)
ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex
Van Essen amp Anderson 1990
The ventral stream hierarchy V1 V2 V4 IT
A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to
position and scale changes
Kobatake amp Tanaka 1994
Visionventralstream
74
Cognition in people
Shape representation in the inferior temporalcortex of monkeys
Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during
view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view
Current Biology 1995 5552-563
Background
Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object
Most theories which postulate that transformations of animage representation precede matching assume either a
complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set
Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer
Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu
Current Biology 1995 Vol 5 No 5552
9520 spring 2003
Modelrsquos early predictions neurons become view-tuned during recognition
Poggio Edelman Riesenhuber (1990 2000)
Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995
A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)
Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007
Database collected by Oliva amp Torralba
Psychophysics of rapid categorization
Rapid categorization task (with mask to test feedforward model)
Animal present or not
30 ms ISI
20 ms
Image
Interval Image-Mask
Mask 1f noise
80 ms
Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005
Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)
Hierarchical feedforward models of the ventral stream
Decoding the neural code Matrix-like read-out from the brain
Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale
Hung Kreiman Poggio DiCarlo 2005
Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005
helliphelliphellip in 2013helliphellip
EAC May 2020
CBMM Summer Schoolbull Signature CBMM (EducationKnowledge Transfer)activity aimed at creating an intergenerationalcommunity around the scienceandengineeringofintelligence
bull Students reported strong influence of lecturesworkingonprojectsandinteractionsamongfacultyTArsquosandpeersontheirownthinkingandresearchdevelopment
understand how the brain works (then) make intelligent machines
WHY
Our vision and mission
Recent Success Stories in AI are based on RL and DL
DL and RL come from neuroscience
Minskyrsquos SNARC
RL
DL
We focus on the combination of neuroscience and engineering to make progress on the problem of intelligence because as in the recent past it is likely that several of the next breakthroughs in ML and AI are likely to come from neuroscienceANDengineering
Vision for the BMM SummerSchool
A quick recap of 40 of the last ~50 years of neuroscience and ML through my eyes
1972-2013
Tuebingen MPI fuer BK (1972-1981)
Werner Reichardtrsquos PhD
Werner with Dr Ruska (center) Photo dated Nov 17 1952 (courtesy B Reichardt)
The four directors of the MPI fuer Biologische Kybernetik
23
The beautiful eyes of flies
Work at 3 levels
bull Fixation and tracking behavior of the fly
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits
bull Biophysics of computation
Fixation and tracking behavior Reichardtrsquos closed loop flight simulator
26
Fixation and tracking behavior
Poggio T and W Reichardt A Theory of Pattern Induced Flight Orientation of the Fly Musca Domestica Kybernetik 12 185-203 1972
27
Cognition in flies probabilistic theories then (coming only now to humans)
The beginning of untethered flight analysis Buumllthoff Poggio amp Wehrhahn Z Naturforsch 35c 811-815 (1980)
most behavioral fly research was done with the Goumltz torque meter
in 1976 based on this recording technology Reichardt amp Poggio developed their theory for Visual control of orientation behaviour in the fly Part I +II Quart Rev Biophysics 9(3) 311-375
open question how well does this theory describe fly behavior of natural flight
in 1980 Wehrhan started high-speed film recording of flies chasing each other
single frame analysis 3D stereo reconstruction
Cognitive theory of basic fly instincts predicts trajectory of chasing fly hellip
Wehrhahn C T Poggio and H Buumllthoff Biological Cybernetics 45 123-130 1982
30
Cognition in flies
Geiger G and T Poggio The Muller-Lyer Figure and the Fly Science 190 479-480 1975
Work at 3 levels
bull Fixation and tracking behavior of the fly (cognition in the flyhellipsimilar to Bayesian approach to cognition in humanshellipno neurons)
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits
bull Biophysics of computation
Motion algorithm the beetle Clorophanus and Reichardtrsquos motion detector
Motion algorithm the beetle and the fly
bull The beetle follows the motion
bull Each photoreceptor sees only an alternation of dark and light how is motion computed
bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector
bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz
bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex
bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)
Relative motion and figure-ground discrimination the fly
Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)
Motion discontinuities and figure-ground discrimination neural circuitry
Towards the neural circuitry Reichardt Poggio Hausen 1983
36
Relative motion
Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003
Two of the neuronshellip
Work at 3 levels
bull Fixation and tracking behavior of the fly
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)
bull Biophysics of computation
39
Biophysics of computation (motion detection)
Biophysics of Computation
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
copy Nature Publishing Group1985
_____________________________________ ____________
Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch
Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA
Istituto di Fisica Universita di Genova Genova Italy
Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain
COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision
Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction
A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems
Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only
Examples of early vision processes
bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour
generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing
Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process
A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems
Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6
one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions
The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the
Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ
ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ
oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ
cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ
$3)135 51052+5 4amp-5 5 (5
13
UacuteѱKŏUdԛ ԛ
ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ
CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ
oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
Cooperativeneuralnetworkforstereo
~ 1979 T Poggio and D Marr MPI Tuebingen
)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz
1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg
nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2
sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2
0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2
3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2
6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2
sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2
=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2
U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2
$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2
$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2
$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2
$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2
$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2
$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2
$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2
$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2
$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2
sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2
$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2
sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2
sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2
13
13
4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z
PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2
13
+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz
D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz
0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2
3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2
6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2
sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2
)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2
sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2
sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2
$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2
sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2
sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2
0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2
3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2
6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2
sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2
=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2
Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2
pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz
UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z
Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2
=sup2
Cooperative Computation of Stereo Disparity
D Marr T Poggio
Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287
Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1
Science is currently published by American Association for the Advancement of Science
Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use
Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission
JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg
httpwwwjstororgMon Jan 22 124953 2007
Vision A Computational Investigation into the Human Representation and Processing of Visual Information
Foreword by Afterword by Tomaso Poggio
David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists
In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level
Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui
Visionwhatiswhere
A complex system must be understood at several different levels
Werner Reichardtrsquos scientific legacy Integrative Neuroscience
bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels
mdash computation mdash algorithms mdash biophysics and circuits
bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip
bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip
bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik
MIT (1981-)
43rd Stated Meeting of the NRP Associates March 14-17 1982
Learning theory + algorithms
Computational Neuroscience
models+experiments
ENGINEERING APPLICATIONS
bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor
How visual cortex works ndash and how it may suggest better computer vision
systems
2
1
1min ( ( ))i i Kf H i
V y f x fmicroisin
=
⎡ ⎤+⎢ ⎥
⎣ ⎦sum
Predictive regularization algorithms
Theorems on foundations of learning
MIT (1981-)
BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001
ON THE MATHEMATICAL FOUNDATIONS OF LEARNING
FELIPE CUCKER AND STEVE SMALE
The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial
T Poggio and CR Shelton
Introduction
(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear
We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of
languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-
ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])
(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice
Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)
Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In
Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University
grant No 8780043
c2001 American Mathematical Society
1
General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2
1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA
Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering
One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label
In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data
In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses
What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-
ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the
algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate
Box 1Formal definitions in supervised learning
Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example
n1lim jXn 2Xj 0 in probability) if and only if for every e 0
n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z
S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as
I$f
zVf zdmz
which is also the expected error of a new sample z drawn from thedistribution In the case of square loss
I$f
XYfx2 y2dmxy
We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S
IS$f 1
n
X
n
i1
Vf zi
Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m
n1lim jI$fS2 IS$fSj 0 in probability
An algorithm is (universally) consistent if uniformly for any distributionm and any e 0
n1lim P I$fSf2Hinf I$famp 1
0
letters to nature
NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group
Why do hierarchical architectures work
bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern
Sung amp Poggio 1995
~15 year old CBCL computer vision research face detection
since 2006 on the market (digital cameras)
Third Annual NSF Site Visit June 8 ndash 9 2016
Moore-like law for ML (1995-2018)
bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses
Visionwhatiswhere
bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)
ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex
Van Essen amp Anderson 1990
The ventral stream hierarchy V1 V2 V4 IT
A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to
position and scale changes
Kobatake amp Tanaka 1994
Visionventralstream
74
Cognition in people
Shape representation in the inferior temporalcortex of monkeys
Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during
view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view
Current Biology 1995 5552-563
Background
Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object
Most theories which postulate that transformations of animage representation precede matching assume either a
complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set
Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer
Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu
Current Biology 1995 Vol 5 No 5552
9520 spring 2003
Modelrsquos early predictions neurons become view-tuned during recognition
Poggio Edelman Riesenhuber (1990 2000)
Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995
A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)
Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007
Database collected by Oliva amp Torralba
Psychophysics of rapid categorization
Rapid categorization task (with mask to test feedforward model)
Animal present or not
30 ms ISI
20 ms
Image
Interval Image-Mask
Mask 1f noise
80 ms
Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005
Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)
Hierarchical feedforward models of the ventral stream
Decoding the neural code Matrix-like read-out from the brain
Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale
Hung Kreiman Poggio DiCarlo 2005
Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005
helliphelliphellip in 2013helliphellip
understand how the brain works (then) make intelligent machines
WHY
Our vision and mission
Recent Success Stories in AI are based on RL and DL
DL and RL come from neuroscience
Minskyrsquos SNARC
RL
DL
We focus on the combination of neuroscience and engineering to make progress on the problem of intelligence because as in the recent past it is likely that several of the next breakthroughs in ML and AI are likely to come from neuroscienceANDengineering
Vision for the BMM SummerSchool
A quick recap of 40 of the last ~50 years of neuroscience and ML through my eyes
1972-2013
Tuebingen MPI fuer BK (1972-1981)
Werner Reichardtrsquos PhD
Werner with Dr Ruska (center) Photo dated Nov 17 1952 (courtesy B Reichardt)
The four directors of the MPI fuer Biologische Kybernetik
23
The beautiful eyes of flies
Work at 3 levels
bull Fixation and tracking behavior of the fly
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits
bull Biophysics of computation
Fixation and tracking behavior Reichardtrsquos closed loop flight simulator
26
Fixation and tracking behavior
Poggio T and W Reichardt A Theory of Pattern Induced Flight Orientation of the Fly Musca Domestica Kybernetik 12 185-203 1972
27
Cognition in flies probabilistic theories then (coming only now to humans)
The beginning of untethered flight analysis Buumllthoff Poggio amp Wehrhahn Z Naturforsch 35c 811-815 (1980)
most behavioral fly research was done with the Goumltz torque meter
in 1976 based on this recording technology Reichardt amp Poggio developed their theory for Visual control of orientation behaviour in the fly Part I +II Quart Rev Biophysics 9(3) 311-375
open question how well does this theory describe fly behavior of natural flight
in 1980 Wehrhan started high-speed film recording of flies chasing each other
single frame analysis 3D stereo reconstruction
Cognitive theory of basic fly instincts predicts trajectory of chasing fly hellip
Wehrhahn C T Poggio and H Buumllthoff Biological Cybernetics 45 123-130 1982
30
Cognition in flies
Geiger G and T Poggio The Muller-Lyer Figure and the Fly Science 190 479-480 1975
Work at 3 levels
bull Fixation and tracking behavior of the fly (cognition in the flyhellipsimilar to Bayesian approach to cognition in humanshellipno neurons)
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits
bull Biophysics of computation
Motion algorithm the beetle Clorophanus and Reichardtrsquos motion detector
Motion algorithm the beetle and the fly
bull The beetle follows the motion
bull Each photoreceptor sees only an alternation of dark and light how is motion computed
bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector
bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz
bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex
bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)
Relative motion and figure-ground discrimination the fly
Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)
Motion discontinuities and figure-ground discrimination neural circuitry
Towards the neural circuitry Reichardt Poggio Hausen 1983
36
Relative motion
Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003
Two of the neuronshellip
Work at 3 levels
bull Fixation and tracking behavior of the fly
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)
bull Biophysics of computation
39
Biophysics of computation (motion detection)
Biophysics of Computation
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
copy Nature Publishing Group1985
_____________________________________ ____________
Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch
Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA
Istituto di Fisica Universita di Genova Genova Italy
Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain
COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision
Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction
A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems
Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only
Examples of early vision processes
bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour
generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing
Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process
A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems
Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6
one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions
The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the
Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ
ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ
oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ
cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ
$3)135 51052+5 4amp-5 5 (5
13
UacuteѱKŏUdԛ ԛ
ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ
CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ
oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
Cooperativeneuralnetworkforstereo
~ 1979 T Poggio and D Marr MPI Tuebingen
)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz
1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg
nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2
sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2
0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2
3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2
6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2
sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2
=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2
U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2
$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2
$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2
$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2
$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2
$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2
$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2
$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2
$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2
$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2
sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2
$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2
sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2
sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2
13
13
4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z
PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2
13
+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz
D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz
0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2
3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2
6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2
sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2
)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2
sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2
sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2
$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2
sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2
sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2
0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2
3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2
6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2
sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2
=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2
Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2
pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz
UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z
Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2
=sup2
Cooperative Computation of Stereo Disparity
D Marr T Poggio
Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287
Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1
Science is currently published by American Association for the Advancement of Science
Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use
Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission
JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg
httpwwwjstororgMon Jan 22 124953 2007
Vision A Computational Investigation into the Human Representation and Processing of Visual Information
Foreword by Afterword by Tomaso Poggio
David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists
In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level
Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui
Visionwhatiswhere
A complex system must be understood at several different levels
Werner Reichardtrsquos scientific legacy Integrative Neuroscience
bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels
mdash computation mdash algorithms mdash biophysics and circuits
bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip
bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip
bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik
MIT (1981-)
43rd Stated Meeting of the NRP Associates March 14-17 1982
Learning theory + algorithms
Computational Neuroscience
models+experiments
ENGINEERING APPLICATIONS
bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor
How visual cortex works ndash and how it may suggest better computer vision
systems
2
1
1min ( ( ))i i Kf H i
V y f x fmicroisin
=
⎡ ⎤+⎢ ⎥
⎣ ⎦sum
Predictive regularization algorithms
Theorems on foundations of learning
MIT (1981-)
BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001
ON THE MATHEMATICAL FOUNDATIONS OF LEARNING
FELIPE CUCKER AND STEVE SMALE
The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial
T Poggio and CR Shelton
Introduction
(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear
We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of
languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-
ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])
(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice
Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)
Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In
Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University
grant No 8780043
c2001 American Mathematical Society
1
General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2
1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA
Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering
One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label
In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data
In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses
What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-
ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the
algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate
Box 1Formal definitions in supervised learning
Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example
n1lim jXn 2Xj 0 in probability) if and only if for every e 0
n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z
S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as
I$f
zVf zdmz
which is also the expected error of a new sample z drawn from thedistribution In the case of square loss
I$f
XYfx2 y2dmxy
We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S
IS$f 1
n
X
n
i1
Vf zi
Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m
n1lim jI$fS2 IS$fSj 0 in probability
An algorithm is (universally) consistent if uniformly for any distributionm and any e 0
n1lim P I$fSf2Hinf I$famp 1
0
letters to nature
NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group
Why do hierarchical architectures work
bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern
Sung amp Poggio 1995
~15 year old CBCL computer vision research face detection
since 2006 on the market (digital cameras)
Third Annual NSF Site Visit June 8 ndash 9 2016
Moore-like law for ML (1995-2018)
bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses
Visionwhatiswhere
bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)
ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex
Van Essen amp Anderson 1990
The ventral stream hierarchy V1 V2 V4 IT
A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to
position and scale changes
Kobatake amp Tanaka 1994
Visionventralstream
74
Cognition in people
Shape representation in the inferior temporalcortex of monkeys
Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during
view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view
Current Biology 1995 5552-563
Background
Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object
Most theories which postulate that transformations of animage representation precede matching assume either a
complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set
Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer
Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu
Current Biology 1995 Vol 5 No 5552
9520 spring 2003
Modelrsquos early predictions neurons become view-tuned during recognition
Poggio Edelman Riesenhuber (1990 2000)
Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995
A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)
Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007
Database collected by Oliva amp Torralba
Psychophysics of rapid categorization
Rapid categorization task (with mask to test feedforward model)
Animal present or not
30 ms ISI
20 ms
Image
Interval Image-Mask
Mask 1f noise
80 ms
Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005
Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)
Hierarchical feedforward models of the ventral stream
Decoding the neural code Matrix-like read-out from the brain
Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale
Hung Kreiman Poggio DiCarlo 2005
Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005
helliphelliphellip in 2013helliphellip
Recent Success Stories in AI are based on RL and DL
DL and RL come from neuroscience
Minskyrsquos SNARC
RL
DL
We focus on the combination of neuroscience and engineering to make progress on the problem of intelligence because as in the recent past it is likely that several of the next breakthroughs in ML and AI are likely to come from neuroscienceANDengineering
Vision for the BMM SummerSchool
A quick recap of 40 of the last ~50 years of neuroscience and ML through my eyes
1972-2013
Tuebingen MPI fuer BK (1972-1981)
Werner Reichardtrsquos PhD
Werner with Dr Ruska (center) Photo dated Nov 17 1952 (courtesy B Reichardt)
The four directors of the MPI fuer Biologische Kybernetik
23
The beautiful eyes of flies
Work at 3 levels
bull Fixation and tracking behavior of the fly
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits
bull Biophysics of computation
Fixation and tracking behavior Reichardtrsquos closed loop flight simulator
26
Fixation and tracking behavior
Poggio T and W Reichardt A Theory of Pattern Induced Flight Orientation of the Fly Musca Domestica Kybernetik 12 185-203 1972
27
Cognition in flies probabilistic theories then (coming only now to humans)
The beginning of untethered flight analysis Buumllthoff Poggio amp Wehrhahn Z Naturforsch 35c 811-815 (1980)
most behavioral fly research was done with the Goumltz torque meter
in 1976 based on this recording technology Reichardt amp Poggio developed their theory for Visual control of orientation behaviour in the fly Part I +II Quart Rev Biophysics 9(3) 311-375
open question how well does this theory describe fly behavior of natural flight
in 1980 Wehrhan started high-speed film recording of flies chasing each other
single frame analysis 3D stereo reconstruction
Cognitive theory of basic fly instincts predicts trajectory of chasing fly hellip
Wehrhahn C T Poggio and H Buumllthoff Biological Cybernetics 45 123-130 1982
30
Cognition in flies
Geiger G and T Poggio The Muller-Lyer Figure and the Fly Science 190 479-480 1975
Work at 3 levels
bull Fixation and tracking behavior of the fly (cognition in the flyhellipsimilar to Bayesian approach to cognition in humanshellipno neurons)
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits
bull Biophysics of computation
Motion algorithm the beetle Clorophanus and Reichardtrsquos motion detector
Motion algorithm the beetle and the fly
bull The beetle follows the motion
bull Each photoreceptor sees only an alternation of dark and light how is motion computed
bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector
bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz
bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex
bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)
Relative motion and figure-ground discrimination the fly
Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)
Motion discontinuities and figure-ground discrimination neural circuitry
Towards the neural circuitry Reichardt Poggio Hausen 1983
36
Relative motion
Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003
Two of the neuronshellip
Work at 3 levels
bull Fixation and tracking behavior of the fly
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)
bull Biophysics of computation
39
Biophysics of computation (motion detection)
Biophysics of Computation
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
copy Nature Publishing Group1985
_____________________________________ ____________
Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch
Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA
Istituto di Fisica Universita di Genova Genova Italy
Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain
COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision
Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction
A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems
Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only
Examples of early vision processes
bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour
generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing
Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process
A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems
Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6
one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions
The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the
Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ
ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ
oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ
cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ
$3)135 51052+5 4amp-5 5 (5
13
UacuteѱKŏUdԛ ԛ
ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ
CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ
oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
Cooperativeneuralnetworkforstereo
~ 1979 T Poggio and D Marr MPI Tuebingen
)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz
1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg
nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2
sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2
0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2
3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2
6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2
sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2
=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2
U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2
$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2
$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2
$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2
$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2
$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2
$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2
$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2
$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2
$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2
sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2
$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2
sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2
sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2
13
13
4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z
PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2
13
+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz
D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz
0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2
3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2
6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2
sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2
)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2
sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2
sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2
$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2
sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2
sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2
0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2
3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2
6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2
sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2
=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2
Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2
pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz
UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z
Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2
=sup2
Cooperative Computation of Stereo Disparity
D Marr T Poggio
Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287
Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1
Science is currently published by American Association for the Advancement of Science
Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use
Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission
JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg
httpwwwjstororgMon Jan 22 124953 2007
Vision A Computational Investigation into the Human Representation and Processing of Visual Information
Foreword by Afterword by Tomaso Poggio
David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists
In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level
Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui
Visionwhatiswhere
A complex system must be understood at several different levels
Werner Reichardtrsquos scientific legacy Integrative Neuroscience
bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels
mdash computation mdash algorithms mdash biophysics and circuits
bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip
bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip
bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik
MIT (1981-)
43rd Stated Meeting of the NRP Associates March 14-17 1982
Learning theory + algorithms
Computational Neuroscience
models+experiments
ENGINEERING APPLICATIONS
bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor
How visual cortex works ndash and how it may suggest better computer vision
systems
2
1
1min ( ( ))i i Kf H i
V y f x fmicroisin
=
⎡ ⎤+⎢ ⎥
⎣ ⎦sum
Predictive regularization algorithms
Theorems on foundations of learning
MIT (1981-)
BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001
ON THE MATHEMATICAL FOUNDATIONS OF LEARNING
FELIPE CUCKER AND STEVE SMALE
The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial
T Poggio and CR Shelton
Introduction
(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear
We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of
languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-
ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])
(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice
Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)
Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In
Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University
grant No 8780043
c2001 American Mathematical Society
1
General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2
1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA
Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering
One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label
In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data
In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses
What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-
ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the
algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate
Box 1Formal definitions in supervised learning
Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example
n1lim jXn 2Xj 0 in probability) if and only if for every e 0
n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z
S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as
I$f
zVf zdmz
which is also the expected error of a new sample z drawn from thedistribution In the case of square loss
I$f
XYfx2 y2dmxy
We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S
IS$f 1
n
X
n
i1
Vf zi
Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m
n1lim jI$fS2 IS$fSj 0 in probability
An algorithm is (universally) consistent if uniformly for any distributionm and any e 0
n1lim P I$fSf2Hinf I$famp 1
0
letters to nature
NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group
Why do hierarchical architectures work
bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern
Sung amp Poggio 1995
~15 year old CBCL computer vision research face detection
since 2006 on the market (digital cameras)
Third Annual NSF Site Visit June 8 ndash 9 2016
Moore-like law for ML (1995-2018)
bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses
Visionwhatiswhere
bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)
ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex
Van Essen amp Anderson 1990
The ventral stream hierarchy V1 V2 V4 IT
A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to
position and scale changes
Kobatake amp Tanaka 1994
Visionventralstream
74
Cognition in people
Shape representation in the inferior temporalcortex of monkeys
Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during
view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view
Current Biology 1995 5552-563
Background
Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object
Most theories which postulate that transformations of animage representation precede matching assume either a
complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set
Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer
Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu
Current Biology 1995 Vol 5 No 5552
9520 spring 2003
Modelrsquos early predictions neurons become view-tuned during recognition
Poggio Edelman Riesenhuber (1990 2000)
Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995
A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)
Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007
Database collected by Oliva amp Torralba
Psychophysics of rapid categorization
Rapid categorization task (with mask to test feedforward model)
Animal present or not
30 ms ISI
20 ms
Image
Interval Image-Mask
Mask 1f noise
80 ms
Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005
Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)
Hierarchical feedforward models of the ventral stream
Decoding the neural code Matrix-like read-out from the brain
Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale
Hung Kreiman Poggio DiCarlo 2005
Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005
helliphelliphellip in 2013helliphellip
DL and RL come from neuroscience
Minskyrsquos SNARC
RL
DL
We focus on the combination of neuroscience and engineering to make progress on the problem of intelligence because as in the recent past it is likely that several of the next breakthroughs in ML and AI are likely to come from neuroscienceANDengineering
Vision for the BMM SummerSchool
A quick recap of 40 of the last ~50 years of neuroscience and ML through my eyes
1972-2013
Tuebingen MPI fuer BK (1972-1981)
Werner Reichardtrsquos PhD
Werner with Dr Ruska (center) Photo dated Nov 17 1952 (courtesy B Reichardt)
The four directors of the MPI fuer Biologische Kybernetik
23
The beautiful eyes of flies
Work at 3 levels
bull Fixation and tracking behavior of the fly
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits
bull Biophysics of computation
Fixation and tracking behavior Reichardtrsquos closed loop flight simulator
26
Fixation and tracking behavior
Poggio T and W Reichardt A Theory of Pattern Induced Flight Orientation of the Fly Musca Domestica Kybernetik 12 185-203 1972
27
Cognition in flies probabilistic theories then (coming only now to humans)
The beginning of untethered flight analysis Buumllthoff Poggio amp Wehrhahn Z Naturforsch 35c 811-815 (1980)
most behavioral fly research was done with the Goumltz torque meter
in 1976 based on this recording technology Reichardt amp Poggio developed their theory for Visual control of orientation behaviour in the fly Part I +II Quart Rev Biophysics 9(3) 311-375
open question how well does this theory describe fly behavior of natural flight
in 1980 Wehrhan started high-speed film recording of flies chasing each other
single frame analysis 3D stereo reconstruction
Cognitive theory of basic fly instincts predicts trajectory of chasing fly hellip
Wehrhahn C T Poggio and H Buumllthoff Biological Cybernetics 45 123-130 1982
30
Cognition in flies
Geiger G and T Poggio The Muller-Lyer Figure and the Fly Science 190 479-480 1975
Work at 3 levels
bull Fixation and tracking behavior of the fly (cognition in the flyhellipsimilar to Bayesian approach to cognition in humanshellipno neurons)
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits
bull Biophysics of computation
Motion algorithm the beetle Clorophanus and Reichardtrsquos motion detector
Motion algorithm the beetle and the fly
bull The beetle follows the motion
bull Each photoreceptor sees only an alternation of dark and light how is motion computed
bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector
bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz
bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex
bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)
Relative motion and figure-ground discrimination the fly
Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)
Motion discontinuities and figure-ground discrimination neural circuitry
Towards the neural circuitry Reichardt Poggio Hausen 1983
36
Relative motion
Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003
Two of the neuronshellip
Work at 3 levels
bull Fixation and tracking behavior of the fly
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)
bull Biophysics of computation
39
Biophysics of computation (motion detection)
Biophysics of Computation
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
copy Nature Publishing Group1985
_____________________________________ ____________
Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch
Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA
Istituto di Fisica Universita di Genova Genova Italy
Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain
COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision
Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction
A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems
Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only
Examples of early vision processes
bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour
generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing
Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process
A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems
Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6
one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions
The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the
Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ
ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ
oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ
cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ
$3)135 51052+5 4amp-5 5 (5
13
UacuteѱKŏUdԛ ԛ
ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ
CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ
oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
Cooperativeneuralnetworkforstereo
~ 1979 T Poggio and D Marr MPI Tuebingen
)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz
1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg
nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2
sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2
0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2
3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2
6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2
sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2
=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2
U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2
$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2
$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2
$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2
$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2
$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2
$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2
$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2
$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2
$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2
sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2
$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2
sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2
sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2
13
13
4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z
PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2
13
+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz
D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz
0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2
3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2
6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2
sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2
)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2
sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2
sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2
$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2
sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2
sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2
0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2
3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2
6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2
sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2
=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2
Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2
pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz
UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z
Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2
=sup2
Cooperative Computation of Stereo Disparity
D Marr T Poggio
Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287
Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1
Science is currently published by American Association for the Advancement of Science
Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use
Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission
JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg
httpwwwjstororgMon Jan 22 124953 2007
Vision A Computational Investigation into the Human Representation and Processing of Visual Information
Foreword by Afterword by Tomaso Poggio
David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists
In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level
Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui
Visionwhatiswhere
A complex system must be understood at several different levels
Werner Reichardtrsquos scientific legacy Integrative Neuroscience
bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels
mdash computation mdash algorithms mdash biophysics and circuits
bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip
bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip
bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik
MIT (1981-)
43rd Stated Meeting of the NRP Associates March 14-17 1982
Learning theory + algorithms
Computational Neuroscience
models+experiments
ENGINEERING APPLICATIONS
bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor
How visual cortex works ndash and how it may suggest better computer vision
systems
2
1
1min ( ( ))i i Kf H i
V y f x fmicroisin
=
⎡ ⎤+⎢ ⎥
⎣ ⎦sum
Predictive regularization algorithms
Theorems on foundations of learning
MIT (1981-)
BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001
ON THE MATHEMATICAL FOUNDATIONS OF LEARNING
FELIPE CUCKER AND STEVE SMALE
The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial
T Poggio and CR Shelton
Introduction
(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear
We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of
languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-
ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])
(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice
Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)
Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In
Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University
grant No 8780043
c2001 American Mathematical Society
1
General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2
1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA
Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering
One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label
In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data
In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses
What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-
ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the
algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate
Box 1Formal definitions in supervised learning
Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example
n1lim jXn 2Xj 0 in probability) if and only if for every e 0
n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z
S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as
I$f
zVf zdmz
which is also the expected error of a new sample z drawn from thedistribution In the case of square loss
I$f
XYfx2 y2dmxy
We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S
IS$f 1
n
X
n
i1
Vf zi
Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m
n1lim jI$fS2 IS$fSj 0 in probability
An algorithm is (universally) consistent if uniformly for any distributionm and any e 0
n1lim P I$fSf2Hinf I$famp 1
0
letters to nature
NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group
Why do hierarchical architectures work
bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern
Sung amp Poggio 1995
~15 year old CBCL computer vision research face detection
since 2006 on the market (digital cameras)
Third Annual NSF Site Visit June 8 ndash 9 2016
Moore-like law for ML (1995-2018)
bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses
Visionwhatiswhere
bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)
ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex
Van Essen amp Anderson 1990
The ventral stream hierarchy V1 V2 V4 IT
A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to
position and scale changes
Kobatake amp Tanaka 1994
Visionventralstream
74
Cognition in people
Shape representation in the inferior temporalcortex of monkeys
Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during
view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view
Current Biology 1995 5552-563
Background
Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object
Most theories which postulate that transformations of animage representation precede matching assume either a
complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set
Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer
Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu
Current Biology 1995 Vol 5 No 5552
9520 spring 2003
Modelrsquos early predictions neurons become view-tuned during recognition
Poggio Edelman Riesenhuber (1990 2000)
Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995
A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)
Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007
Database collected by Oliva amp Torralba
Psychophysics of rapid categorization
Rapid categorization task (with mask to test feedforward model)
Animal present or not
30 ms ISI
20 ms
Image
Interval Image-Mask
Mask 1f noise
80 ms
Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005
Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)
Hierarchical feedforward models of the ventral stream
Decoding the neural code Matrix-like read-out from the brain
Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale
Hung Kreiman Poggio DiCarlo 2005
Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005
helliphelliphellip in 2013helliphellip
We focus on the combination of neuroscience and engineering to make progress on the problem of intelligence because as in the recent past it is likely that several of the next breakthroughs in ML and AI are likely to come from neuroscienceANDengineering
Vision for the BMM SummerSchool
A quick recap of 40 of the last ~50 years of neuroscience and ML through my eyes
1972-2013
Tuebingen MPI fuer BK (1972-1981)
Werner Reichardtrsquos PhD
Werner with Dr Ruska (center) Photo dated Nov 17 1952 (courtesy B Reichardt)
The four directors of the MPI fuer Biologische Kybernetik
23
The beautiful eyes of flies
Work at 3 levels
bull Fixation and tracking behavior of the fly
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits
bull Biophysics of computation
Fixation and tracking behavior Reichardtrsquos closed loop flight simulator
26
Fixation and tracking behavior
Poggio T and W Reichardt A Theory of Pattern Induced Flight Orientation of the Fly Musca Domestica Kybernetik 12 185-203 1972
27
Cognition in flies probabilistic theories then (coming only now to humans)
The beginning of untethered flight analysis Buumllthoff Poggio amp Wehrhahn Z Naturforsch 35c 811-815 (1980)
most behavioral fly research was done with the Goumltz torque meter
in 1976 based on this recording technology Reichardt amp Poggio developed their theory for Visual control of orientation behaviour in the fly Part I +II Quart Rev Biophysics 9(3) 311-375
open question how well does this theory describe fly behavior of natural flight
in 1980 Wehrhan started high-speed film recording of flies chasing each other
single frame analysis 3D stereo reconstruction
Cognitive theory of basic fly instincts predicts trajectory of chasing fly hellip
Wehrhahn C T Poggio and H Buumllthoff Biological Cybernetics 45 123-130 1982
30
Cognition in flies
Geiger G and T Poggio The Muller-Lyer Figure and the Fly Science 190 479-480 1975
Work at 3 levels
bull Fixation and tracking behavior of the fly (cognition in the flyhellipsimilar to Bayesian approach to cognition in humanshellipno neurons)
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits
bull Biophysics of computation
Motion algorithm the beetle Clorophanus and Reichardtrsquos motion detector
Motion algorithm the beetle and the fly
bull The beetle follows the motion
bull Each photoreceptor sees only an alternation of dark and light how is motion computed
bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector
bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz
bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex
bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)
Relative motion and figure-ground discrimination the fly
Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)
Motion discontinuities and figure-ground discrimination neural circuitry
Towards the neural circuitry Reichardt Poggio Hausen 1983
36
Relative motion
Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003
Two of the neuronshellip
Work at 3 levels
bull Fixation and tracking behavior of the fly
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)
bull Biophysics of computation
39
Biophysics of computation (motion detection)
Biophysics of Computation
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
copy Nature Publishing Group1985
_____________________________________ ____________
Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch
Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA
Istituto di Fisica Universita di Genova Genova Italy
Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain
COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision
Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction
A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems
Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only
Examples of early vision processes
bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour
generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing
Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process
A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems
Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6
one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions
The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the
Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ
ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ
oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ
cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ
$3)135 51052+5 4amp-5 5 (5
13
UacuteѱKŏUdԛ ԛ
ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ
CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ
oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
Cooperativeneuralnetworkforstereo
~ 1979 T Poggio and D Marr MPI Tuebingen
)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz
1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg
nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2
sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2
0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2
3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2
6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2
sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2
=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2
U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2
$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2
$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2
$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2
$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2
$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2
$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2
$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2
$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2
$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2
sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2
$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2
sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2
sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2
13
13
4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z
PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2
13
+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz
D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz
0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2
3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2
6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2
sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2
)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2
sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2
sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2
$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2
sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2
sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2
0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2
3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2
6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2
sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2
=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2
Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2
pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz
UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z
Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2
=sup2
Cooperative Computation of Stereo Disparity
D Marr T Poggio
Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287
Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1
Science is currently published by American Association for the Advancement of Science
Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use
Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission
JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg
httpwwwjstororgMon Jan 22 124953 2007
Vision A Computational Investigation into the Human Representation and Processing of Visual Information
Foreword by Afterword by Tomaso Poggio
David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists
In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level
Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui
Visionwhatiswhere
A complex system must be understood at several different levels
Werner Reichardtrsquos scientific legacy Integrative Neuroscience
bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels
mdash computation mdash algorithms mdash biophysics and circuits
bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip
bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip
bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik
MIT (1981-)
43rd Stated Meeting of the NRP Associates March 14-17 1982
Learning theory + algorithms
Computational Neuroscience
models+experiments
ENGINEERING APPLICATIONS
bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor
How visual cortex works ndash and how it may suggest better computer vision
systems
2
1
1min ( ( ))i i Kf H i
V y f x fmicroisin
=
⎡ ⎤+⎢ ⎥
⎣ ⎦sum
Predictive regularization algorithms
Theorems on foundations of learning
MIT (1981-)
BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001
ON THE MATHEMATICAL FOUNDATIONS OF LEARNING
FELIPE CUCKER AND STEVE SMALE
The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial
T Poggio and CR Shelton
Introduction
(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear
We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of
languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-
ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])
(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice
Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)
Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In
Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University
grant No 8780043
c2001 American Mathematical Society
1
General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2
1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA
Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering
One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label
In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data
In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses
What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-
ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the
algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate
Box 1Formal definitions in supervised learning
Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example
n1lim jXn 2Xj 0 in probability) if and only if for every e 0
n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z
S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as
I$f
zVf zdmz
which is also the expected error of a new sample z drawn from thedistribution In the case of square loss
I$f
XYfx2 y2dmxy
We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S
IS$f 1
n
X
n
i1
Vf zi
Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m
n1lim jI$fS2 IS$fSj 0 in probability
An algorithm is (universally) consistent if uniformly for any distributionm and any e 0
n1lim P I$fSf2Hinf I$famp 1
0
letters to nature
NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group
Why do hierarchical architectures work
bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern
Sung amp Poggio 1995
~15 year old CBCL computer vision research face detection
since 2006 on the market (digital cameras)
Third Annual NSF Site Visit June 8 ndash 9 2016
Moore-like law for ML (1995-2018)
bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses
Visionwhatiswhere
bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)
ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex
Van Essen amp Anderson 1990
The ventral stream hierarchy V1 V2 V4 IT
A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to
position and scale changes
Kobatake amp Tanaka 1994
Visionventralstream
74
Cognition in people
Shape representation in the inferior temporalcortex of monkeys
Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during
view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view
Current Biology 1995 5552-563
Background
Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object
Most theories which postulate that transformations of animage representation precede matching assume either a
complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set
Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer
Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu
Current Biology 1995 Vol 5 No 5552
9520 spring 2003
Modelrsquos early predictions neurons become view-tuned during recognition
Poggio Edelman Riesenhuber (1990 2000)
Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995
A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)
Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007
Database collected by Oliva amp Torralba
Psychophysics of rapid categorization
Rapid categorization task (with mask to test feedforward model)
Animal present or not
30 ms ISI
20 ms
Image
Interval Image-Mask
Mask 1f noise
80 ms
Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005
Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)
Hierarchical feedforward models of the ventral stream
Decoding the neural code Matrix-like read-out from the brain
Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale
Hung Kreiman Poggio DiCarlo 2005
Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005
helliphelliphellip in 2013helliphellip
A quick recap of 40 of the last ~50 years of neuroscience and ML through my eyes
1972-2013
Tuebingen MPI fuer BK (1972-1981)
Werner Reichardtrsquos PhD
Werner with Dr Ruska (center) Photo dated Nov 17 1952 (courtesy B Reichardt)
The four directors of the MPI fuer Biologische Kybernetik
23
The beautiful eyes of flies
Work at 3 levels
bull Fixation and tracking behavior of the fly
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits
bull Biophysics of computation
Fixation and tracking behavior Reichardtrsquos closed loop flight simulator
26
Fixation and tracking behavior
Poggio T and W Reichardt A Theory of Pattern Induced Flight Orientation of the Fly Musca Domestica Kybernetik 12 185-203 1972
27
Cognition in flies probabilistic theories then (coming only now to humans)
The beginning of untethered flight analysis Buumllthoff Poggio amp Wehrhahn Z Naturforsch 35c 811-815 (1980)
most behavioral fly research was done with the Goumltz torque meter
in 1976 based on this recording technology Reichardt amp Poggio developed their theory for Visual control of orientation behaviour in the fly Part I +II Quart Rev Biophysics 9(3) 311-375
open question how well does this theory describe fly behavior of natural flight
in 1980 Wehrhan started high-speed film recording of flies chasing each other
single frame analysis 3D stereo reconstruction
Cognitive theory of basic fly instincts predicts trajectory of chasing fly hellip
Wehrhahn C T Poggio and H Buumllthoff Biological Cybernetics 45 123-130 1982
30
Cognition in flies
Geiger G and T Poggio The Muller-Lyer Figure and the Fly Science 190 479-480 1975
Work at 3 levels
bull Fixation and tracking behavior of the fly (cognition in the flyhellipsimilar to Bayesian approach to cognition in humanshellipno neurons)
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits
bull Biophysics of computation
Motion algorithm the beetle Clorophanus and Reichardtrsquos motion detector
Motion algorithm the beetle and the fly
bull The beetle follows the motion
bull Each photoreceptor sees only an alternation of dark and light how is motion computed
bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector
bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz
bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex
bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)
Relative motion and figure-ground discrimination the fly
Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)
Motion discontinuities and figure-ground discrimination neural circuitry
Towards the neural circuitry Reichardt Poggio Hausen 1983
36
Relative motion
Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003
Two of the neuronshellip
Work at 3 levels
bull Fixation and tracking behavior of the fly
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)
bull Biophysics of computation
39
Biophysics of computation (motion detection)
Biophysics of Computation
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
copy Nature Publishing Group1985
_____________________________________ ____________
Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch
Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA
Istituto di Fisica Universita di Genova Genova Italy
Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain
COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision
Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction
A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems
Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only
Examples of early vision processes
bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour
generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing
Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process
A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems
Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6
one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions
The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the
Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ
ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ
oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ
cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ
$3)135 51052+5 4amp-5 5 (5
13
UacuteѱKŏUdԛ ԛ
ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ
CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ
oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
Cooperativeneuralnetworkforstereo
~ 1979 T Poggio and D Marr MPI Tuebingen
)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz
1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg
nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2
sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2
0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2
3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2
6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2
sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2
=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2
U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2
$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2
$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2
$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2
$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2
$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2
$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2
$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2
$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2
$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2
sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2
$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2
sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2
sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2
13
13
4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z
PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2
13
+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz
D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz
0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2
3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2
6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2
sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2
)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2
sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2
sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2
$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2
sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2
sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2
0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2
3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2
6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2
sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2
=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2
Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2
pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz
UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z
Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2
=sup2
Cooperative Computation of Stereo Disparity
D Marr T Poggio
Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287
Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1
Science is currently published by American Association for the Advancement of Science
Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use
Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission
JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg
httpwwwjstororgMon Jan 22 124953 2007
Vision A Computational Investigation into the Human Representation and Processing of Visual Information
Foreword by Afterword by Tomaso Poggio
David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists
In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level
Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui
Visionwhatiswhere
A complex system must be understood at several different levels
Werner Reichardtrsquos scientific legacy Integrative Neuroscience
bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels
mdash computation mdash algorithms mdash biophysics and circuits
bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip
bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip
bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik
MIT (1981-)
43rd Stated Meeting of the NRP Associates March 14-17 1982
Learning theory + algorithms
Computational Neuroscience
models+experiments
ENGINEERING APPLICATIONS
bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor
How visual cortex works ndash and how it may suggest better computer vision
systems
2
1
1min ( ( ))i i Kf H i
V y f x fmicroisin
=
⎡ ⎤+⎢ ⎥
⎣ ⎦sum
Predictive regularization algorithms
Theorems on foundations of learning
MIT (1981-)
BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001
ON THE MATHEMATICAL FOUNDATIONS OF LEARNING
FELIPE CUCKER AND STEVE SMALE
The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial
T Poggio and CR Shelton
Introduction
(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear
We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of
languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-
ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])
(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice
Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)
Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In
Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University
grant No 8780043
c2001 American Mathematical Society
1
General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2
1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA
Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering
One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label
In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data
In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses
What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-
ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the
algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate
Box 1Formal definitions in supervised learning
Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example
n1lim jXn 2Xj 0 in probability) if and only if for every e 0
n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z
S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as
I$f
zVf zdmz
which is also the expected error of a new sample z drawn from thedistribution In the case of square loss
I$f
XYfx2 y2dmxy
We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S
IS$f 1
n
X
n
i1
Vf zi
Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m
n1lim jI$fS2 IS$fSj 0 in probability
An algorithm is (universally) consistent if uniformly for any distributionm and any e 0
n1lim P I$fSf2Hinf I$famp 1
0
letters to nature
NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group
Why do hierarchical architectures work
bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern
Sung amp Poggio 1995
~15 year old CBCL computer vision research face detection
since 2006 on the market (digital cameras)
Third Annual NSF Site Visit June 8 ndash 9 2016
Moore-like law for ML (1995-2018)
bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses
Visionwhatiswhere
bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)
ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex
Van Essen amp Anderson 1990
The ventral stream hierarchy V1 V2 V4 IT
A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to
position and scale changes
Kobatake amp Tanaka 1994
Visionventralstream
74
Cognition in people
Shape representation in the inferior temporalcortex of monkeys
Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during
view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view
Current Biology 1995 5552-563
Background
Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object
Most theories which postulate that transformations of animage representation precede matching assume either a
complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set
Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer
Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu
Current Biology 1995 Vol 5 No 5552
9520 spring 2003
Modelrsquos early predictions neurons become view-tuned during recognition
Poggio Edelman Riesenhuber (1990 2000)
Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995
A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)
Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007
Database collected by Oliva amp Torralba
Psychophysics of rapid categorization
Rapid categorization task (with mask to test feedforward model)
Animal present or not
30 ms ISI
20 ms
Image
Interval Image-Mask
Mask 1f noise
80 ms
Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005
Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)
Hierarchical feedforward models of the ventral stream
Decoding the neural code Matrix-like read-out from the brain
Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale
Hung Kreiman Poggio DiCarlo 2005
Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005
helliphelliphellip in 2013helliphellip
Tuebingen MPI fuer BK (1972-1981)
Werner Reichardtrsquos PhD
Werner with Dr Ruska (center) Photo dated Nov 17 1952 (courtesy B Reichardt)
The four directors of the MPI fuer Biologische Kybernetik
23
The beautiful eyes of flies
Work at 3 levels
bull Fixation and tracking behavior of the fly
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits
bull Biophysics of computation
Fixation and tracking behavior Reichardtrsquos closed loop flight simulator
26
Fixation and tracking behavior
Poggio T and W Reichardt A Theory of Pattern Induced Flight Orientation of the Fly Musca Domestica Kybernetik 12 185-203 1972
27
Cognition in flies probabilistic theories then (coming only now to humans)
The beginning of untethered flight analysis Buumllthoff Poggio amp Wehrhahn Z Naturforsch 35c 811-815 (1980)
most behavioral fly research was done with the Goumltz torque meter
in 1976 based on this recording technology Reichardt amp Poggio developed their theory for Visual control of orientation behaviour in the fly Part I +II Quart Rev Biophysics 9(3) 311-375
open question how well does this theory describe fly behavior of natural flight
in 1980 Wehrhan started high-speed film recording of flies chasing each other
single frame analysis 3D stereo reconstruction
Cognitive theory of basic fly instincts predicts trajectory of chasing fly hellip
Wehrhahn C T Poggio and H Buumllthoff Biological Cybernetics 45 123-130 1982
30
Cognition in flies
Geiger G and T Poggio The Muller-Lyer Figure and the Fly Science 190 479-480 1975
Work at 3 levels
bull Fixation and tracking behavior of the fly (cognition in the flyhellipsimilar to Bayesian approach to cognition in humanshellipno neurons)
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits
bull Biophysics of computation
Motion algorithm the beetle Clorophanus and Reichardtrsquos motion detector
Motion algorithm the beetle and the fly
bull The beetle follows the motion
bull Each photoreceptor sees only an alternation of dark and light how is motion computed
bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector
bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz
bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex
bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)
Relative motion and figure-ground discrimination the fly
Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)
Motion discontinuities and figure-ground discrimination neural circuitry
Towards the neural circuitry Reichardt Poggio Hausen 1983
36
Relative motion
Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003
Two of the neuronshellip
Work at 3 levels
bull Fixation and tracking behavior of the fly
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)
bull Biophysics of computation
39
Biophysics of computation (motion detection)
Biophysics of Computation
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
copy Nature Publishing Group1985
_____________________________________ ____________
Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch
Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA
Istituto di Fisica Universita di Genova Genova Italy
Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain
COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision
Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction
A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems
Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only
Examples of early vision processes
bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour
generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing
Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process
A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems
Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6
one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions
The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the
Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ
ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ
oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ
cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ
$3)135 51052+5 4amp-5 5 (5
13
UacuteѱKŏUdԛ ԛ
ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ
CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ
oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
Cooperativeneuralnetworkforstereo
~ 1979 T Poggio and D Marr MPI Tuebingen
)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz
1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg
nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2
sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2
0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2
3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2
6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2
sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2
=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2
U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2
$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2
$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2
$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2
$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2
$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2
$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2
$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2
$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2
$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2
sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2
$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2
sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2
sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2
13
13
4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z
PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2
13
+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz
D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz
0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2
3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2
6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2
sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2
)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2
sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2
sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2
$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2
sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2
sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2
0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2
3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2
6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2
sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2
=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2
Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2
pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz
UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z
Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2
=sup2
Cooperative Computation of Stereo Disparity
D Marr T Poggio
Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287
Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1
Science is currently published by American Association for the Advancement of Science
Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use
Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission
JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg
httpwwwjstororgMon Jan 22 124953 2007
Vision A Computational Investigation into the Human Representation and Processing of Visual Information
Foreword by Afterword by Tomaso Poggio
David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists
In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level
Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui
Visionwhatiswhere
A complex system must be understood at several different levels
Werner Reichardtrsquos scientific legacy Integrative Neuroscience
bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels
mdash computation mdash algorithms mdash biophysics and circuits
bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip
bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip
bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik
MIT (1981-)
43rd Stated Meeting of the NRP Associates March 14-17 1982
Learning theory + algorithms
Computational Neuroscience
models+experiments
ENGINEERING APPLICATIONS
bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor
How visual cortex works ndash and how it may suggest better computer vision
systems
2
1
1min ( ( ))i i Kf H i
V y f x fmicroisin
=
⎡ ⎤+⎢ ⎥
⎣ ⎦sum
Predictive regularization algorithms
Theorems on foundations of learning
MIT (1981-)
BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001
ON THE MATHEMATICAL FOUNDATIONS OF LEARNING
FELIPE CUCKER AND STEVE SMALE
The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial
T Poggio and CR Shelton
Introduction
(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear
We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of
languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-
ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])
(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice
Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)
Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In
Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University
grant No 8780043
c2001 American Mathematical Society
1
General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2
1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA
Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering
One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label
In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data
In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses
What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-
ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the
algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate
Box 1Formal definitions in supervised learning
Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example
n1lim jXn 2Xj 0 in probability) if and only if for every e 0
n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z
S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as
I$f
zVf zdmz
which is also the expected error of a new sample z drawn from thedistribution In the case of square loss
I$f
XYfx2 y2dmxy
We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S
IS$f 1
n
X
n
i1
Vf zi
Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m
n1lim jI$fS2 IS$fSj 0 in probability
An algorithm is (universally) consistent if uniformly for any distributionm and any e 0
n1lim P I$fSf2Hinf I$famp 1
0
letters to nature
NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group
Why do hierarchical architectures work
bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern
Sung amp Poggio 1995
~15 year old CBCL computer vision research face detection
since 2006 on the market (digital cameras)
Third Annual NSF Site Visit June 8 ndash 9 2016
Moore-like law for ML (1995-2018)
bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses
Visionwhatiswhere
bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)
ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex
Van Essen amp Anderson 1990
The ventral stream hierarchy V1 V2 V4 IT
A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to
position and scale changes
Kobatake amp Tanaka 1994
Visionventralstream
74
Cognition in people
Shape representation in the inferior temporalcortex of monkeys
Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during
view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view
Current Biology 1995 5552-563
Background
Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object
Most theories which postulate that transformations of animage representation precede matching assume either a
complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set
Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer
Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu
Current Biology 1995 Vol 5 No 5552
9520 spring 2003
Modelrsquos early predictions neurons become view-tuned during recognition
Poggio Edelman Riesenhuber (1990 2000)
Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995
A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)
Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007
Database collected by Oliva amp Torralba
Psychophysics of rapid categorization
Rapid categorization task (with mask to test feedforward model)
Animal present or not
30 ms ISI
20 ms
Image
Interval Image-Mask
Mask 1f noise
80 ms
Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005
Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)
Hierarchical feedforward models of the ventral stream
Decoding the neural code Matrix-like read-out from the brain
Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale
Hung Kreiman Poggio DiCarlo 2005
Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005
helliphelliphellip in 2013helliphellip
Werner Reichardtrsquos PhD
Werner with Dr Ruska (center) Photo dated Nov 17 1952 (courtesy B Reichardt)
The four directors of the MPI fuer Biologische Kybernetik
23
The beautiful eyes of flies
Work at 3 levels
bull Fixation and tracking behavior of the fly
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits
bull Biophysics of computation
Fixation and tracking behavior Reichardtrsquos closed loop flight simulator
26
Fixation and tracking behavior
Poggio T and W Reichardt A Theory of Pattern Induced Flight Orientation of the Fly Musca Domestica Kybernetik 12 185-203 1972
27
Cognition in flies probabilistic theories then (coming only now to humans)
The beginning of untethered flight analysis Buumllthoff Poggio amp Wehrhahn Z Naturforsch 35c 811-815 (1980)
most behavioral fly research was done with the Goumltz torque meter
in 1976 based on this recording technology Reichardt amp Poggio developed their theory for Visual control of orientation behaviour in the fly Part I +II Quart Rev Biophysics 9(3) 311-375
open question how well does this theory describe fly behavior of natural flight
in 1980 Wehrhan started high-speed film recording of flies chasing each other
single frame analysis 3D stereo reconstruction
Cognitive theory of basic fly instincts predicts trajectory of chasing fly hellip
Wehrhahn C T Poggio and H Buumllthoff Biological Cybernetics 45 123-130 1982
30
Cognition in flies
Geiger G and T Poggio The Muller-Lyer Figure and the Fly Science 190 479-480 1975
Work at 3 levels
bull Fixation and tracking behavior of the fly (cognition in the flyhellipsimilar to Bayesian approach to cognition in humanshellipno neurons)
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits
bull Biophysics of computation
Motion algorithm the beetle Clorophanus and Reichardtrsquos motion detector
Motion algorithm the beetle and the fly
bull The beetle follows the motion
bull Each photoreceptor sees only an alternation of dark and light how is motion computed
bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector
bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz
bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex
bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)
Relative motion and figure-ground discrimination the fly
Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)
Motion discontinuities and figure-ground discrimination neural circuitry
Towards the neural circuitry Reichardt Poggio Hausen 1983
36
Relative motion
Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003
Two of the neuronshellip
Work at 3 levels
bull Fixation and tracking behavior of the fly
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)
bull Biophysics of computation
39
Biophysics of computation (motion detection)
Biophysics of Computation
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
copy Nature Publishing Group1985
_____________________________________ ____________
Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch
Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA
Istituto di Fisica Universita di Genova Genova Italy
Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain
COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision
Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction
A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems
Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only
Examples of early vision processes
bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour
generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing
Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process
A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems
Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6
one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions
The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the
Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ
ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ
oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ
cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ
$3)135 51052+5 4amp-5 5 (5
13
UacuteѱKŏUdԛ ԛ
ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ
CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ
oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
Cooperativeneuralnetworkforstereo
~ 1979 T Poggio and D Marr MPI Tuebingen
)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz
1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg
nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2
sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2
0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2
3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2
6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2
sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2
=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2
U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2
$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2
$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2
$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2
$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2
$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2
$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2
$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2
$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2
$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2
sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2
$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2
sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2
sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2
13
13
4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z
PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2
13
+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz
D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz
0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2
3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2
6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2
sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2
)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2
sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2
sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2
$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2
sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2
sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2
0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2
3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2
6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2
sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2
=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2
Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2
pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz
UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z
Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2
=sup2
Cooperative Computation of Stereo Disparity
D Marr T Poggio
Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287
Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1
Science is currently published by American Association for the Advancement of Science
Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use
Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission
JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg
httpwwwjstororgMon Jan 22 124953 2007
Vision A Computational Investigation into the Human Representation and Processing of Visual Information
Foreword by Afterword by Tomaso Poggio
David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists
In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level
Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui
Visionwhatiswhere
A complex system must be understood at several different levels
Werner Reichardtrsquos scientific legacy Integrative Neuroscience
bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels
mdash computation mdash algorithms mdash biophysics and circuits
bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip
bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip
bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik
MIT (1981-)
43rd Stated Meeting of the NRP Associates March 14-17 1982
Learning theory + algorithms
Computational Neuroscience
models+experiments
ENGINEERING APPLICATIONS
bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor
How visual cortex works ndash and how it may suggest better computer vision
systems
2
1
1min ( ( ))i i Kf H i
V y f x fmicroisin
=
⎡ ⎤+⎢ ⎥
⎣ ⎦sum
Predictive regularization algorithms
Theorems on foundations of learning
MIT (1981-)
BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001
ON THE MATHEMATICAL FOUNDATIONS OF LEARNING
FELIPE CUCKER AND STEVE SMALE
The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial
T Poggio and CR Shelton
Introduction
(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear
We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of
languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-
ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])
(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice
Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)
Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In
Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University
grant No 8780043
c2001 American Mathematical Society
1
General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2
1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA
Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering
One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label
In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data
In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses
What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-
ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the
algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate
Box 1Formal definitions in supervised learning
Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example
n1lim jXn 2Xj 0 in probability) if and only if for every e 0
n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z
S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as
I$f
zVf zdmz
which is also the expected error of a new sample z drawn from thedistribution In the case of square loss
I$f
XYfx2 y2dmxy
We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S
IS$f 1
n
X
n
i1
Vf zi
Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m
n1lim jI$fS2 IS$fSj 0 in probability
An algorithm is (universally) consistent if uniformly for any distributionm and any e 0
n1lim P I$fSf2Hinf I$famp 1
0
letters to nature
NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group
Why do hierarchical architectures work
bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern
Sung amp Poggio 1995
~15 year old CBCL computer vision research face detection
since 2006 on the market (digital cameras)
Third Annual NSF Site Visit June 8 ndash 9 2016
Moore-like law for ML (1995-2018)
bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses
Visionwhatiswhere
bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)
ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex
Van Essen amp Anderson 1990
The ventral stream hierarchy V1 V2 V4 IT
A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to
position and scale changes
Kobatake amp Tanaka 1994
Visionventralstream
74
Cognition in people
Shape representation in the inferior temporalcortex of monkeys
Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during
view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view
Current Biology 1995 5552-563
Background
Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object
Most theories which postulate that transformations of animage representation precede matching assume either a
complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set
Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer
Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu
Current Biology 1995 Vol 5 No 5552
9520 spring 2003
Modelrsquos early predictions neurons become view-tuned during recognition
Poggio Edelman Riesenhuber (1990 2000)
Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995
A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)
Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007
Database collected by Oliva amp Torralba
Psychophysics of rapid categorization
Rapid categorization task (with mask to test feedforward model)
Animal present or not
30 ms ISI
20 ms
Image
Interval Image-Mask
Mask 1f noise
80 ms
Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005
Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)
Hierarchical feedforward models of the ventral stream
Decoding the neural code Matrix-like read-out from the brain
Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale
Hung Kreiman Poggio DiCarlo 2005
Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005
helliphelliphellip in 2013helliphellip
The four directors of the MPI fuer Biologische Kybernetik
23
The beautiful eyes of flies
Work at 3 levels
bull Fixation and tracking behavior of the fly
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits
bull Biophysics of computation
Fixation and tracking behavior Reichardtrsquos closed loop flight simulator
26
Fixation and tracking behavior
Poggio T and W Reichardt A Theory of Pattern Induced Flight Orientation of the Fly Musca Domestica Kybernetik 12 185-203 1972
27
Cognition in flies probabilistic theories then (coming only now to humans)
The beginning of untethered flight analysis Buumllthoff Poggio amp Wehrhahn Z Naturforsch 35c 811-815 (1980)
most behavioral fly research was done with the Goumltz torque meter
in 1976 based on this recording technology Reichardt amp Poggio developed their theory for Visual control of orientation behaviour in the fly Part I +II Quart Rev Biophysics 9(3) 311-375
open question how well does this theory describe fly behavior of natural flight
in 1980 Wehrhan started high-speed film recording of flies chasing each other
single frame analysis 3D stereo reconstruction
Cognitive theory of basic fly instincts predicts trajectory of chasing fly hellip
Wehrhahn C T Poggio and H Buumllthoff Biological Cybernetics 45 123-130 1982
30
Cognition in flies
Geiger G and T Poggio The Muller-Lyer Figure and the Fly Science 190 479-480 1975
Work at 3 levels
bull Fixation and tracking behavior of the fly (cognition in the flyhellipsimilar to Bayesian approach to cognition in humanshellipno neurons)
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits
bull Biophysics of computation
Motion algorithm the beetle Clorophanus and Reichardtrsquos motion detector
Motion algorithm the beetle and the fly
bull The beetle follows the motion
bull Each photoreceptor sees only an alternation of dark and light how is motion computed
bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector
bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz
bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex
bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)
Relative motion and figure-ground discrimination the fly
Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)
Motion discontinuities and figure-ground discrimination neural circuitry
Towards the neural circuitry Reichardt Poggio Hausen 1983
36
Relative motion
Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003
Two of the neuronshellip
Work at 3 levels
bull Fixation and tracking behavior of the fly
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)
bull Biophysics of computation
39
Biophysics of computation (motion detection)
Biophysics of Computation
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
copy Nature Publishing Group1985
_____________________________________ ____________
Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch
Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA
Istituto di Fisica Universita di Genova Genova Italy
Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain
COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision
Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction
A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems
Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only
Examples of early vision processes
bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour
generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing
Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process
A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems
Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6
one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions
The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the
Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ
ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ
oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ
cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ
$3)135 51052+5 4amp-5 5 (5
13
UacuteѱKŏUdԛ ԛ
ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ
CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ
oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
Cooperativeneuralnetworkforstereo
~ 1979 T Poggio and D Marr MPI Tuebingen
)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz
1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg
nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2
sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2
0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2
3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2
6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2
sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2
=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2
U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2
$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2
$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2
$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2
$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2
$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2
$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2
$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2
$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2
$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2
sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2
$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2
sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2
sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2
13
13
4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z
PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2
13
+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz
D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz
0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2
3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2
6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2
sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2
)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2
sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2
sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2
$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2
sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2
sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2
0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2
3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2
6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2
sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2
=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2
Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2
pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz
UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z
Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2
=sup2
Cooperative Computation of Stereo Disparity
D Marr T Poggio
Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287
Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1
Science is currently published by American Association for the Advancement of Science
Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use
Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission
JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg
httpwwwjstororgMon Jan 22 124953 2007
Vision A Computational Investigation into the Human Representation and Processing of Visual Information
Foreword by Afterword by Tomaso Poggio
David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists
In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level
Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui
Visionwhatiswhere
A complex system must be understood at several different levels
Werner Reichardtrsquos scientific legacy Integrative Neuroscience
bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels
mdash computation mdash algorithms mdash biophysics and circuits
bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip
bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip
bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik
MIT (1981-)
43rd Stated Meeting of the NRP Associates March 14-17 1982
Learning theory + algorithms
Computational Neuroscience
models+experiments
ENGINEERING APPLICATIONS
bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor
How visual cortex works ndash and how it may suggest better computer vision
systems
2
1
1min ( ( ))i i Kf H i
V y f x fmicroisin
=
⎡ ⎤+⎢ ⎥
⎣ ⎦sum
Predictive regularization algorithms
Theorems on foundations of learning
MIT (1981-)
BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001
ON THE MATHEMATICAL FOUNDATIONS OF LEARNING
FELIPE CUCKER AND STEVE SMALE
The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial
T Poggio and CR Shelton
Introduction
(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear
We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of
languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-
ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])
(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice
Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)
Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In
Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University
grant No 8780043
c2001 American Mathematical Society
1
General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2
1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA
Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering
One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label
In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data
In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses
What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-
ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the
algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate
Box 1Formal definitions in supervised learning
Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example
n1lim jXn 2Xj 0 in probability) if and only if for every e 0
n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z
S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as
I$f
zVf zdmz
which is also the expected error of a new sample z drawn from thedistribution In the case of square loss
I$f
XYfx2 y2dmxy
We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S
IS$f 1
n
X
n
i1
Vf zi
Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m
n1lim jI$fS2 IS$fSj 0 in probability
An algorithm is (universally) consistent if uniformly for any distributionm and any e 0
n1lim P I$fSf2Hinf I$famp 1
0
letters to nature
NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group
Why do hierarchical architectures work
bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern
Sung amp Poggio 1995
~15 year old CBCL computer vision research face detection
since 2006 on the market (digital cameras)
Third Annual NSF Site Visit June 8 ndash 9 2016
Moore-like law for ML (1995-2018)
bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses
Visionwhatiswhere
bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)
ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex
Van Essen amp Anderson 1990
The ventral stream hierarchy V1 V2 V4 IT
A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to
position and scale changes
Kobatake amp Tanaka 1994
Visionventralstream
74
Cognition in people
Shape representation in the inferior temporalcortex of monkeys
Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during
view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view
Current Biology 1995 5552-563
Background
Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object
Most theories which postulate that transformations of animage representation precede matching assume either a
complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set
Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer
Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu
Current Biology 1995 Vol 5 No 5552
9520 spring 2003
Modelrsquos early predictions neurons become view-tuned during recognition
Poggio Edelman Riesenhuber (1990 2000)
Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995
A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)
Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007
Database collected by Oliva amp Torralba
Psychophysics of rapid categorization
Rapid categorization task (with mask to test feedforward model)
Animal present or not
30 ms ISI
20 ms
Image
Interval Image-Mask
Mask 1f noise
80 ms
Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005
Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)
Hierarchical feedforward models of the ventral stream
Decoding the neural code Matrix-like read-out from the brain
Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale
Hung Kreiman Poggio DiCarlo 2005
Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005
helliphelliphellip in 2013helliphellip
23
The beautiful eyes of flies
Work at 3 levels
bull Fixation and tracking behavior of the fly
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits
bull Biophysics of computation
Fixation and tracking behavior Reichardtrsquos closed loop flight simulator
26
Fixation and tracking behavior
Poggio T and W Reichardt A Theory of Pattern Induced Flight Orientation of the Fly Musca Domestica Kybernetik 12 185-203 1972
27
Cognition in flies probabilistic theories then (coming only now to humans)
The beginning of untethered flight analysis Buumllthoff Poggio amp Wehrhahn Z Naturforsch 35c 811-815 (1980)
most behavioral fly research was done with the Goumltz torque meter
in 1976 based on this recording technology Reichardt amp Poggio developed their theory for Visual control of orientation behaviour in the fly Part I +II Quart Rev Biophysics 9(3) 311-375
open question how well does this theory describe fly behavior of natural flight
in 1980 Wehrhan started high-speed film recording of flies chasing each other
single frame analysis 3D stereo reconstruction
Cognitive theory of basic fly instincts predicts trajectory of chasing fly hellip
Wehrhahn C T Poggio and H Buumllthoff Biological Cybernetics 45 123-130 1982
30
Cognition in flies
Geiger G and T Poggio The Muller-Lyer Figure and the Fly Science 190 479-480 1975
Work at 3 levels
bull Fixation and tracking behavior of the fly (cognition in the flyhellipsimilar to Bayesian approach to cognition in humanshellipno neurons)
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits
bull Biophysics of computation
Motion algorithm the beetle Clorophanus and Reichardtrsquos motion detector
Motion algorithm the beetle and the fly
bull The beetle follows the motion
bull Each photoreceptor sees only an alternation of dark and light how is motion computed
bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector
bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz
bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex
bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)
Relative motion and figure-ground discrimination the fly
Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)
Motion discontinuities and figure-ground discrimination neural circuitry
Towards the neural circuitry Reichardt Poggio Hausen 1983
36
Relative motion
Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003
Two of the neuronshellip
Work at 3 levels
bull Fixation and tracking behavior of the fly
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)
bull Biophysics of computation
39
Biophysics of computation (motion detection)
Biophysics of Computation
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
copy Nature Publishing Group1985
_____________________________________ ____________
Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch
Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA
Istituto di Fisica Universita di Genova Genova Italy
Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain
COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision
Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction
A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems
Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only
Examples of early vision processes
bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour
generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing
Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process
A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems
Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6
one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions
The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the
Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ
ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ
oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ
cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ
$3)135 51052+5 4amp-5 5 (5
13
UacuteѱKŏUdԛ ԛ
ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ
CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ
oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
Cooperativeneuralnetworkforstereo
~ 1979 T Poggio and D Marr MPI Tuebingen
)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz
1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg
nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2
sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2
0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2
3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2
6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2
sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2
=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2
U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2
$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2
$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2
$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2
$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2
$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2
$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2
$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2
$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2
$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2
sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2
$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2
sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2
sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2
13
13
4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z
PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2
13
+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz
D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz
0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2
3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2
6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2
sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2
)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2
sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2
sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2
$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2
sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2
sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2
0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2
3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2
6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2
sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2
=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2
Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2
pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz
UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z
Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2
=sup2
Cooperative Computation of Stereo Disparity
D Marr T Poggio
Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287
Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1
Science is currently published by American Association for the Advancement of Science
Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use
Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission
JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg
httpwwwjstororgMon Jan 22 124953 2007
Vision A Computational Investigation into the Human Representation and Processing of Visual Information
Foreword by Afterword by Tomaso Poggio
David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists
In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level
Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui
Visionwhatiswhere
A complex system must be understood at several different levels
Werner Reichardtrsquos scientific legacy Integrative Neuroscience
bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels
mdash computation mdash algorithms mdash biophysics and circuits
bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip
bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip
bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik
MIT (1981-)
43rd Stated Meeting of the NRP Associates March 14-17 1982
Learning theory + algorithms
Computational Neuroscience
models+experiments
ENGINEERING APPLICATIONS
bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor
How visual cortex works ndash and how it may suggest better computer vision
systems
2
1
1min ( ( ))i i Kf H i
V y f x fmicroisin
=
⎡ ⎤+⎢ ⎥
⎣ ⎦sum
Predictive regularization algorithms
Theorems on foundations of learning
MIT (1981-)
BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001
ON THE MATHEMATICAL FOUNDATIONS OF LEARNING
FELIPE CUCKER AND STEVE SMALE
The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial
T Poggio and CR Shelton
Introduction
(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear
We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of
languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-
ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])
(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice
Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)
Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In
Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University
grant No 8780043
c2001 American Mathematical Society
1
General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2
1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA
Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering
One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label
In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data
In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses
What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-
ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the
algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate
Box 1Formal definitions in supervised learning
Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example
n1lim jXn 2Xj 0 in probability) if and only if for every e 0
n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z
S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as
I$f
zVf zdmz
which is also the expected error of a new sample z drawn from thedistribution In the case of square loss
I$f
XYfx2 y2dmxy
We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S
IS$f 1
n
X
n
i1
Vf zi
Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m
n1lim jI$fS2 IS$fSj 0 in probability
An algorithm is (universally) consistent if uniformly for any distributionm and any e 0
n1lim P I$fSf2Hinf I$famp 1
0
letters to nature
NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group
Why do hierarchical architectures work
bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern
Sung amp Poggio 1995
~15 year old CBCL computer vision research face detection
since 2006 on the market (digital cameras)
Third Annual NSF Site Visit June 8 ndash 9 2016
Moore-like law for ML (1995-2018)
bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses
Visionwhatiswhere
bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)
ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex
Van Essen amp Anderson 1990
The ventral stream hierarchy V1 V2 V4 IT
A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to
position and scale changes
Kobatake amp Tanaka 1994
Visionventralstream
74
Cognition in people
Shape representation in the inferior temporalcortex of monkeys
Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during
view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view
Current Biology 1995 5552-563
Background
Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object
Most theories which postulate that transformations of animage representation precede matching assume either a
complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set
Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer
Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu
Current Biology 1995 Vol 5 No 5552
9520 spring 2003
Modelrsquos early predictions neurons become view-tuned during recognition
Poggio Edelman Riesenhuber (1990 2000)
Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995
A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)
Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007
Database collected by Oliva amp Torralba
Psychophysics of rapid categorization
Rapid categorization task (with mask to test feedforward model)
Animal present or not
30 ms ISI
20 ms
Image
Interval Image-Mask
Mask 1f noise
80 ms
Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005
Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)
Hierarchical feedforward models of the ventral stream
Decoding the neural code Matrix-like read-out from the brain
Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale
Hung Kreiman Poggio DiCarlo 2005
Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005
helliphelliphellip in 2013helliphellip
Work at 3 levels
bull Fixation and tracking behavior of the fly
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits
bull Biophysics of computation
Fixation and tracking behavior Reichardtrsquos closed loop flight simulator
26
Fixation and tracking behavior
Poggio T and W Reichardt A Theory of Pattern Induced Flight Orientation of the Fly Musca Domestica Kybernetik 12 185-203 1972
27
Cognition in flies probabilistic theories then (coming only now to humans)
The beginning of untethered flight analysis Buumllthoff Poggio amp Wehrhahn Z Naturforsch 35c 811-815 (1980)
most behavioral fly research was done with the Goumltz torque meter
in 1976 based on this recording technology Reichardt amp Poggio developed their theory for Visual control of orientation behaviour in the fly Part I +II Quart Rev Biophysics 9(3) 311-375
open question how well does this theory describe fly behavior of natural flight
in 1980 Wehrhan started high-speed film recording of flies chasing each other
single frame analysis 3D stereo reconstruction
Cognitive theory of basic fly instincts predicts trajectory of chasing fly hellip
Wehrhahn C T Poggio and H Buumllthoff Biological Cybernetics 45 123-130 1982
30
Cognition in flies
Geiger G and T Poggio The Muller-Lyer Figure and the Fly Science 190 479-480 1975
Work at 3 levels
bull Fixation and tracking behavior of the fly (cognition in the flyhellipsimilar to Bayesian approach to cognition in humanshellipno neurons)
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits
bull Biophysics of computation
Motion algorithm the beetle Clorophanus and Reichardtrsquos motion detector
Motion algorithm the beetle and the fly
bull The beetle follows the motion
bull Each photoreceptor sees only an alternation of dark and light how is motion computed
bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector
bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz
bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex
bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)
Relative motion and figure-ground discrimination the fly
Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)
Motion discontinuities and figure-ground discrimination neural circuitry
Towards the neural circuitry Reichardt Poggio Hausen 1983
36
Relative motion
Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003
Two of the neuronshellip
Work at 3 levels
bull Fixation and tracking behavior of the fly
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)
bull Biophysics of computation
39
Biophysics of computation (motion detection)
Biophysics of Computation
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
copy Nature Publishing Group1985
_____________________________________ ____________
Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch
Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA
Istituto di Fisica Universita di Genova Genova Italy
Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain
COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision
Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction
A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems
Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only
Examples of early vision processes
bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour
generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing
Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process
A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems
Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6
one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions
The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the
Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ
ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ
oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ
cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ
$3)135 51052+5 4amp-5 5 (5
13
UacuteѱKŏUdԛ ԛ
ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ
CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ
oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
Cooperativeneuralnetworkforstereo
~ 1979 T Poggio and D Marr MPI Tuebingen
)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz
1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg
nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2
sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2
0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2
3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2
6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2
sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2
=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2
U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2
$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2
$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2
$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2
$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2
$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2
$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2
$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2
$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2
$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2
sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2
$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2
sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2
sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2
13
13
4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z
PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2
13
+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz
D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz
0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2
3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2
6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2
sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2
)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2
sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2
sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2
$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2
sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2
sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2
0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2
3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2
6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2
sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2
=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2
Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2
pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz
UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z
Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2
=sup2
Cooperative Computation of Stereo Disparity
D Marr T Poggio
Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287
Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1
Science is currently published by American Association for the Advancement of Science
Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use
Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission
JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg
httpwwwjstororgMon Jan 22 124953 2007
Vision A Computational Investigation into the Human Representation and Processing of Visual Information
Foreword by Afterword by Tomaso Poggio
David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists
In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level
Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui
Visionwhatiswhere
A complex system must be understood at several different levels
Werner Reichardtrsquos scientific legacy Integrative Neuroscience
bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels
mdash computation mdash algorithms mdash biophysics and circuits
bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip
bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip
bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik
MIT (1981-)
43rd Stated Meeting of the NRP Associates March 14-17 1982
Learning theory + algorithms
Computational Neuroscience
models+experiments
ENGINEERING APPLICATIONS
bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor
How visual cortex works ndash and how it may suggest better computer vision
systems
2
1
1min ( ( ))i i Kf H i
V y f x fmicroisin
=
⎡ ⎤+⎢ ⎥
⎣ ⎦sum
Predictive regularization algorithms
Theorems on foundations of learning
MIT (1981-)
BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001
ON THE MATHEMATICAL FOUNDATIONS OF LEARNING
FELIPE CUCKER AND STEVE SMALE
The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial
T Poggio and CR Shelton
Introduction
(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear
We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of
languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-
ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])
(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice
Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)
Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In
Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University
grant No 8780043
c2001 American Mathematical Society
1
General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2
1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA
Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering
One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label
In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data
In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses
What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-
ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the
algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate
Box 1Formal definitions in supervised learning
Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example
n1lim jXn 2Xj 0 in probability) if and only if for every e 0
n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z
S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as
I$f
zVf zdmz
which is also the expected error of a new sample z drawn from thedistribution In the case of square loss
I$f
XYfx2 y2dmxy
We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S
IS$f 1
n
X
n
i1
Vf zi
Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m
n1lim jI$fS2 IS$fSj 0 in probability
An algorithm is (universally) consistent if uniformly for any distributionm and any e 0
n1lim P I$fSf2Hinf I$famp 1
0
letters to nature
NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group
Why do hierarchical architectures work
bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern
Sung amp Poggio 1995
~15 year old CBCL computer vision research face detection
since 2006 on the market (digital cameras)
Third Annual NSF Site Visit June 8 ndash 9 2016
Moore-like law for ML (1995-2018)
bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses
Visionwhatiswhere
bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)
ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex
Van Essen amp Anderson 1990
The ventral stream hierarchy V1 V2 V4 IT
A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to
position and scale changes
Kobatake amp Tanaka 1994
Visionventralstream
74
Cognition in people
Shape representation in the inferior temporalcortex of monkeys
Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during
view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view
Current Biology 1995 5552-563
Background
Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object
Most theories which postulate that transformations of animage representation precede matching assume either a
complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set
Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer
Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu
Current Biology 1995 Vol 5 No 5552
9520 spring 2003
Modelrsquos early predictions neurons become view-tuned during recognition
Poggio Edelman Riesenhuber (1990 2000)
Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995
A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)
Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007
Database collected by Oliva amp Torralba
Psychophysics of rapid categorization
Rapid categorization task (with mask to test feedforward model)
Animal present or not
30 ms ISI
20 ms
Image
Interval Image-Mask
Mask 1f noise
80 ms
Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005
Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)
Hierarchical feedforward models of the ventral stream
Decoding the neural code Matrix-like read-out from the brain
Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale
Hung Kreiman Poggio DiCarlo 2005
Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005
helliphelliphellip in 2013helliphellip
Fixation and tracking behavior Reichardtrsquos closed loop flight simulator
26
Fixation and tracking behavior
Poggio T and W Reichardt A Theory of Pattern Induced Flight Orientation of the Fly Musca Domestica Kybernetik 12 185-203 1972
27
Cognition in flies probabilistic theories then (coming only now to humans)
The beginning of untethered flight analysis Buumllthoff Poggio amp Wehrhahn Z Naturforsch 35c 811-815 (1980)
most behavioral fly research was done with the Goumltz torque meter
in 1976 based on this recording technology Reichardt amp Poggio developed their theory for Visual control of orientation behaviour in the fly Part I +II Quart Rev Biophysics 9(3) 311-375
open question how well does this theory describe fly behavior of natural flight
in 1980 Wehrhan started high-speed film recording of flies chasing each other
single frame analysis 3D stereo reconstruction
Cognitive theory of basic fly instincts predicts trajectory of chasing fly hellip
Wehrhahn C T Poggio and H Buumllthoff Biological Cybernetics 45 123-130 1982
30
Cognition in flies
Geiger G and T Poggio The Muller-Lyer Figure and the Fly Science 190 479-480 1975
Work at 3 levels
bull Fixation and tracking behavior of the fly (cognition in the flyhellipsimilar to Bayesian approach to cognition in humanshellipno neurons)
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits
bull Biophysics of computation
Motion algorithm the beetle Clorophanus and Reichardtrsquos motion detector
Motion algorithm the beetle and the fly
bull The beetle follows the motion
bull Each photoreceptor sees only an alternation of dark and light how is motion computed
bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector
bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz
bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex
bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)
Relative motion and figure-ground discrimination the fly
Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)
Motion discontinuities and figure-ground discrimination neural circuitry
Towards the neural circuitry Reichardt Poggio Hausen 1983
36
Relative motion
Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003
Two of the neuronshellip
Work at 3 levels
bull Fixation and tracking behavior of the fly
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)
bull Biophysics of computation
39
Biophysics of computation (motion detection)
Biophysics of Computation
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
copy Nature Publishing Group1985
_____________________________________ ____________
Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch
Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA
Istituto di Fisica Universita di Genova Genova Italy
Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain
COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision
Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction
A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems
Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only
Examples of early vision processes
bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour
generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing
Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process
A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems
Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6
one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions
The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the
Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ
ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ
oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ
cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ
$3)135 51052+5 4amp-5 5 (5
13
UacuteѱKŏUdԛ ԛ
ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ
CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ
oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
Cooperativeneuralnetworkforstereo
~ 1979 T Poggio and D Marr MPI Tuebingen
)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz
1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg
nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2
sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2
0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2
3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2
6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2
sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2
=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2
U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2
$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2
$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2
$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2
$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2
$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2
$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2
$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2
$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2
$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2
sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2
$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2
sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2
sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2
13
13
4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z
PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2
13
+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz
D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz
0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2
3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2
6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2
sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2
)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2
sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2
sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2
$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2
sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2
sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2
0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2
3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2
6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2
sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2
=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2
Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2
pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz
UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z
Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2
=sup2
Cooperative Computation of Stereo Disparity
D Marr T Poggio
Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287
Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1
Science is currently published by American Association for the Advancement of Science
Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use
Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission
JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg
httpwwwjstororgMon Jan 22 124953 2007
Vision A Computational Investigation into the Human Representation and Processing of Visual Information
Foreword by Afterword by Tomaso Poggio
David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists
In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level
Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui
Visionwhatiswhere
A complex system must be understood at several different levels
Werner Reichardtrsquos scientific legacy Integrative Neuroscience
bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels
mdash computation mdash algorithms mdash biophysics and circuits
bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip
bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip
bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik
MIT (1981-)
43rd Stated Meeting of the NRP Associates March 14-17 1982
Learning theory + algorithms
Computational Neuroscience
models+experiments
ENGINEERING APPLICATIONS
bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor
How visual cortex works ndash and how it may suggest better computer vision
systems
2
1
1min ( ( ))i i Kf H i
V y f x fmicroisin
=
⎡ ⎤+⎢ ⎥
⎣ ⎦sum
Predictive regularization algorithms
Theorems on foundations of learning
MIT (1981-)
BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001
ON THE MATHEMATICAL FOUNDATIONS OF LEARNING
FELIPE CUCKER AND STEVE SMALE
The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial
T Poggio and CR Shelton
Introduction
(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear
We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of
languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-
ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])
(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice
Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)
Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In
Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University
grant No 8780043
c2001 American Mathematical Society
1
General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2
1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA
Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering
One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label
In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data
In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses
What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-
ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the
algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate
Box 1Formal definitions in supervised learning
Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example
n1lim jXn 2Xj 0 in probability) if and only if for every e 0
n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z
S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as
I$f
zVf zdmz
which is also the expected error of a new sample z drawn from thedistribution In the case of square loss
I$f
XYfx2 y2dmxy
We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S
IS$f 1
n
X
n
i1
Vf zi
Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m
n1lim jI$fS2 IS$fSj 0 in probability
An algorithm is (universally) consistent if uniformly for any distributionm and any e 0
n1lim P I$fSf2Hinf I$famp 1
0
letters to nature
NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group
Why do hierarchical architectures work
bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern
Sung amp Poggio 1995
~15 year old CBCL computer vision research face detection
since 2006 on the market (digital cameras)
Third Annual NSF Site Visit June 8 ndash 9 2016
Moore-like law for ML (1995-2018)
bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses
Visionwhatiswhere
bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)
ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex
Van Essen amp Anderson 1990
The ventral stream hierarchy V1 V2 V4 IT
A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to
position and scale changes
Kobatake amp Tanaka 1994
Visionventralstream
74
Cognition in people
Shape representation in the inferior temporalcortex of monkeys
Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during
view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view
Current Biology 1995 5552-563
Background
Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object
Most theories which postulate that transformations of animage representation precede matching assume either a
complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set
Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer
Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu
Current Biology 1995 Vol 5 No 5552
9520 spring 2003
Modelrsquos early predictions neurons become view-tuned during recognition
Poggio Edelman Riesenhuber (1990 2000)
Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995
A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)
Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007
Database collected by Oliva amp Torralba
Psychophysics of rapid categorization
Rapid categorization task (with mask to test feedforward model)
Animal present or not
30 ms ISI
20 ms
Image
Interval Image-Mask
Mask 1f noise
80 ms
Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005
Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)
Hierarchical feedforward models of the ventral stream
Decoding the neural code Matrix-like read-out from the brain
Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale
Hung Kreiman Poggio DiCarlo 2005
Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005
helliphelliphellip in 2013helliphellip
26
Fixation and tracking behavior
Poggio T and W Reichardt A Theory of Pattern Induced Flight Orientation of the Fly Musca Domestica Kybernetik 12 185-203 1972
27
Cognition in flies probabilistic theories then (coming only now to humans)
The beginning of untethered flight analysis Buumllthoff Poggio amp Wehrhahn Z Naturforsch 35c 811-815 (1980)
most behavioral fly research was done with the Goumltz torque meter
in 1976 based on this recording technology Reichardt amp Poggio developed their theory for Visual control of orientation behaviour in the fly Part I +II Quart Rev Biophysics 9(3) 311-375
open question how well does this theory describe fly behavior of natural flight
in 1980 Wehrhan started high-speed film recording of flies chasing each other
single frame analysis 3D stereo reconstruction
Cognitive theory of basic fly instincts predicts trajectory of chasing fly hellip
Wehrhahn C T Poggio and H Buumllthoff Biological Cybernetics 45 123-130 1982
30
Cognition in flies
Geiger G and T Poggio The Muller-Lyer Figure and the Fly Science 190 479-480 1975
Work at 3 levels
bull Fixation and tracking behavior of the fly (cognition in the flyhellipsimilar to Bayesian approach to cognition in humanshellipno neurons)
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits
bull Biophysics of computation
Motion algorithm the beetle Clorophanus and Reichardtrsquos motion detector
Motion algorithm the beetle and the fly
bull The beetle follows the motion
bull Each photoreceptor sees only an alternation of dark and light how is motion computed
bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector
bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz
bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex
bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)
Relative motion and figure-ground discrimination the fly
Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)
Motion discontinuities and figure-ground discrimination neural circuitry
Towards the neural circuitry Reichardt Poggio Hausen 1983
36
Relative motion
Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003
Two of the neuronshellip
Work at 3 levels
bull Fixation and tracking behavior of the fly
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)
bull Biophysics of computation
39
Biophysics of computation (motion detection)
Biophysics of Computation
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
copy Nature Publishing Group1985
_____________________________________ ____________
Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch
Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA
Istituto di Fisica Universita di Genova Genova Italy
Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain
COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision
Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction
A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems
Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only
Examples of early vision processes
bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour
generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing
Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process
A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems
Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6
one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions
The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the
Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ
ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ
oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ
cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ
$3)135 51052+5 4amp-5 5 (5
13
UacuteѱKŏUdԛ ԛ
ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ
CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ
oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
Cooperativeneuralnetworkforstereo
~ 1979 T Poggio and D Marr MPI Tuebingen
)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz
1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg
nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2
sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2
0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2
3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2
6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2
sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2
=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2
U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2
$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2
$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2
$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2
$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2
$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2
$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2
$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2
$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2
$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2
sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2
$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2
sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2
sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2
13
13
4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z
PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2
13
+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz
D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz
0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2
3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2
6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2
sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2
)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2
sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2
sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2
$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2
sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2
sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2
0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2
3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2
6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2
sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2
=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2
Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2
pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz
UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z
Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2
=sup2
Cooperative Computation of Stereo Disparity
D Marr T Poggio
Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287
Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1
Science is currently published by American Association for the Advancement of Science
Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use
Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission
JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg
httpwwwjstororgMon Jan 22 124953 2007
Vision A Computational Investigation into the Human Representation and Processing of Visual Information
Foreword by Afterword by Tomaso Poggio
David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists
In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level
Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui
Visionwhatiswhere
A complex system must be understood at several different levels
Werner Reichardtrsquos scientific legacy Integrative Neuroscience
bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels
mdash computation mdash algorithms mdash biophysics and circuits
bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip
bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip
bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik
MIT (1981-)
43rd Stated Meeting of the NRP Associates March 14-17 1982
Learning theory + algorithms
Computational Neuroscience
models+experiments
ENGINEERING APPLICATIONS
bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor
How visual cortex works ndash and how it may suggest better computer vision
systems
2
1
1min ( ( ))i i Kf H i
V y f x fmicroisin
=
⎡ ⎤+⎢ ⎥
⎣ ⎦sum
Predictive regularization algorithms
Theorems on foundations of learning
MIT (1981-)
BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001
ON THE MATHEMATICAL FOUNDATIONS OF LEARNING
FELIPE CUCKER AND STEVE SMALE
The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial
T Poggio and CR Shelton
Introduction
(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear
We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of
languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-
ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])
(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice
Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)
Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In
Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University
grant No 8780043
c2001 American Mathematical Society
1
General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2
1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA
Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering
One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label
In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data
In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses
What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-
ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the
algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate
Box 1Formal definitions in supervised learning
Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example
n1lim jXn 2Xj 0 in probability) if and only if for every e 0
n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z
S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as
I$f
zVf zdmz
which is also the expected error of a new sample z drawn from thedistribution In the case of square loss
I$f
XYfx2 y2dmxy
We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S
IS$f 1
n
X
n
i1
Vf zi
Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m
n1lim jI$fS2 IS$fSj 0 in probability
An algorithm is (universally) consistent if uniformly for any distributionm and any e 0
n1lim P I$fSf2Hinf I$famp 1
0
letters to nature
NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group
Why do hierarchical architectures work
bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern
Sung amp Poggio 1995
~15 year old CBCL computer vision research face detection
since 2006 on the market (digital cameras)
Third Annual NSF Site Visit June 8 ndash 9 2016
Moore-like law for ML (1995-2018)
bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses
Visionwhatiswhere
bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)
ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex
Van Essen amp Anderson 1990
The ventral stream hierarchy V1 V2 V4 IT
A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to
position and scale changes
Kobatake amp Tanaka 1994
Visionventralstream
74
Cognition in people
Shape representation in the inferior temporalcortex of monkeys
Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during
view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view
Current Biology 1995 5552-563
Background
Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object
Most theories which postulate that transformations of animage representation precede matching assume either a
complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set
Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer
Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu
Current Biology 1995 Vol 5 No 5552
9520 spring 2003
Modelrsquos early predictions neurons become view-tuned during recognition
Poggio Edelman Riesenhuber (1990 2000)
Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995
A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)
Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007
Database collected by Oliva amp Torralba
Psychophysics of rapid categorization
Rapid categorization task (with mask to test feedforward model)
Animal present or not
30 ms ISI
20 ms
Image
Interval Image-Mask
Mask 1f noise
80 ms
Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005
Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)
Hierarchical feedforward models of the ventral stream
Decoding the neural code Matrix-like read-out from the brain
Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale
Hung Kreiman Poggio DiCarlo 2005
Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005
helliphelliphellip in 2013helliphellip
27
Cognition in flies probabilistic theories then (coming only now to humans)
The beginning of untethered flight analysis Buumllthoff Poggio amp Wehrhahn Z Naturforsch 35c 811-815 (1980)
most behavioral fly research was done with the Goumltz torque meter
in 1976 based on this recording technology Reichardt amp Poggio developed their theory for Visual control of orientation behaviour in the fly Part I +II Quart Rev Biophysics 9(3) 311-375
open question how well does this theory describe fly behavior of natural flight
in 1980 Wehrhan started high-speed film recording of flies chasing each other
single frame analysis 3D stereo reconstruction
Cognitive theory of basic fly instincts predicts trajectory of chasing fly hellip
Wehrhahn C T Poggio and H Buumllthoff Biological Cybernetics 45 123-130 1982
30
Cognition in flies
Geiger G and T Poggio The Muller-Lyer Figure and the Fly Science 190 479-480 1975
Work at 3 levels
bull Fixation and tracking behavior of the fly (cognition in the flyhellipsimilar to Bayesian approach to cognition in humanshellipno neurons)
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits
bull Biophysics of computation
Motion algorithm the beetle Clorophanus and Reichardtrsquos motion detector
Motion algorithm the beetle and the fly
bull The beetle follows the motion
bull Each photoreceptor sees only an alternation of dark and light how is motion computed
bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector
bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz
bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex
bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)
Relative motion and figure-ground discrimination the fly
Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)
Motion discontinuities and figure-ground discrimination neural circuitry
Towards the neural circuitry Reichardt Poggio Hausen 1983
36
Relative motion
Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003
Two of the neuronshellip
Work at 3 levels
bull Fixation and tracking behavior of the fly
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)
bull Biophysics of computation
39
Biophysics of computation (motion detection)
Biophysics of Computation
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
copy Nature Publishing Group1985
_____________________________________ ____________
Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch
Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA
Istituto di Fisica Universita di Genova Genova Italy
Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain
COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision
Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction
A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems
Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only
Examples of early vision processes
bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour
generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing
Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process
A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems
Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6
one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions
The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the
Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ
ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ
oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ
cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ
$3)135 51052+5 4amp-5 5 (5
13
UacuteѱKŏUdԛ ԛ
ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ
CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ
oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
Cooperativeneuralnetworkforstereo
~ 1979 T Poggio and D Marr MPI Tuebingen
)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz
1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg
nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2
sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2
0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2
3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2
6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2
sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2
=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2
U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2
$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2
$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2
$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2
$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2
$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2
$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2
$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2
$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2
$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2
sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2
$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2
sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2
sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2
13
13
4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z
PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2
13
+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz
D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz
0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2
3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2
6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2
sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2
)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2
sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2
sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2
$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2
sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2
sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2
0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2
3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2
6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2
sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2
=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2
Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2
pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz
UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z
Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2
=sup2
Cooperative Computation of Stereo Disparity
D Marr T Poggio
Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287
Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1
Science is currently published by American Association for the Advancement of Science
Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use
Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission
JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg
httpwwwjstororgMon Jan 22 124953 2007
Vision A Computational Investigation into the Human Representation and Processing of Visual Information
Foreword by Afterword by Tomaso Poggio
David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists
In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level
Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui
Visionwhatiswhere
A complex system must be understood at several different levels
Werner Reichardtrsquos scientific legacy Integrative Neuroscience
bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels
mdash computation mdash algorithms mdash biophysics and circuits
bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip
bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip
bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik
MIT (1981-)
43rd Stated Meeting of the NRP Associates March 14-17 1982
Learning theory + algorithms
Computational Neuroscience
models+experiments
ENGINEERING APPLICATIONS
bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor
How visual cortex works ndash and how it may suggest better computer vision
systems
2
1
1min ( ( ))i i Kf H i
V y f x fmicroisin
=
⎡ ⎤+⎢ ⎥
⎣ ⎦sum
Predictive regularization algorithms
Theorems on foundations of learning
MIT (1981-)
BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001
ON THE MATHEMATICAL FOUNDATIONS OF LEARNING
FELIPE CUCKER AND STEVE SMALE
The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial
T Poggio and CR Shelton
Introduction
(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear
We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of
languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-
ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])
(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice
Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)
Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In
Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University
grant No 8780043
c2001 American Mathematical Society
1
General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2
1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA
Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering
One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label
In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data
In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses
What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-
ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the
algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate
Box 1Formal definitions in supervised learning
Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example
n1lim jXn 2Xj 0 in probability) if and only if for every e 0
n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z
S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as
I$f
zVf zdmz
which is also the expected error of a new sample z drawn from thedistribution In the case of square loss
I$f
XYfx2 y2dmxy
We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S
IS$f 1
n
X
n
i1
Vf zi
Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m
n1lim jI$fS2 IS$fSj 0 in probability
An algorithm is (universally) consistent if uniformly for any distributionm and any e 0
n1lim P I$fSf2Hinf I$famp 1
0
letters to nature
NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group
Why do hierarchical architectures work
bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern
Sung amp Poggio 1995
~15 year old CBCL computer vision research face detection
since 2006 on the market (digital cameras)
Third Annual NSF Site Visit June 8 ndash 9 2016
Moore-like law for ML (1995-2018)
bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses
Visionwhatiswhere
bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)
ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex
Van Essen amp Anderson 1990
The ventral stream hierarchy V1 V2 V4 IT
A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to
position and scale changes
Kobatake amp Tanaka 1994
Visionventralstream
74
Cognition in people
Shape representation in the inferior temporalcortex of monkeys
Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during
view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view
Current Biology 1995 5552-563
Background
Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object
Most theories which postulate that transformations of animage representation precede matching assume either a
complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set
Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer
Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu
Current Biology 1995 Vol 5 No 5552
9520 spring 2003
Modelrsquos early predictions neurons become view-tuned during recognition
Poggio Edelman Riesenhuber (1990 2000)
Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995
A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)
Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007
Database collected by Oliva amp Torralba
Psychophysics of rapid categorization
Rapid categorization task (with mask to test feedforward model)
Animal present or not
30 ms ISI
20 ms
Image
Interval Image-Mask
Mask 1f noise
80 ms
Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005
Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)
Hierarchical feedforward models of the ventral stream
Decoding the neural code Matrix-like read-out from the brain
Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale
Hung Kreiman Poggio DiCarlo 2005
Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005
helliphelliphellip in 2013helliphellip
The beginning of untethered flight analysis Buumllthoff Poggio amp Wehrhahn Z Naturforsch 35c 811-815 (1980)
most behavioral fly research was done with the Goumltz torque meter
in 1976 based on this recording technology Reichardt amp Poggio developed their theory for Visual control of orientation behaviour in the fly Part I +II Quart Rev Biophysics 9(3) 311-375
open question how well does this theory describe fly behavior of natural flight
in 1980 Wehrhan started high-speed film recording of flies chasing each other
single frame analysis 3D stereo reconstruction
Cognitive theory of basic fly instincts predicts trajectory of chasing fly hellip
Wehrhahn C T Poggio and H Buumllthoff Biological Cybernetics 45 123-130 1982
30
Cognition in flies
Geiger G and T Poggio The Muller-Lyer Figure and the Fly Science 190 479-480 1975
Work at 3 levels
bull Fixation and tracking behavior of the fly (cognition in the flyhellipsimilar to Bayesian approach to cognition in humanshellipno neurons)
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits
bull Biophysics of computation
Motion algorithm the beetle Clorophanus and Reichardtrsquos motion detector
Motion algorithm the beetle and the fly
bull The beetle follows the motion
bull Each photoreceptor sees only an alternation of dark and light how is motion computed
bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector
bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz
bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex
bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)
Relative motion and figure-ground discrimination the fly
Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)
Motion discontinuities and figure-ground discrimination neural circuitry
Towards the neural circuitry Reichardt Poggio Hausen 1983
36
Relative motion
Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003
Two of the neuronshellip
Work at 3 levels
bull Fixation and tracking behavior of the fly
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)
bull Biophysics of computation
39
Biophysics of computation (motion detection)
Biophysics of Computation
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
copy Nature Publishing Group1985
_____________________________________ ____________
Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch
Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA
Istituto di Fisica Universita di Genova Genova Italy
Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain
COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision
Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction
A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems
Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only
Examples of early vision processes
bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour
generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing
Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process
A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems
Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6
one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions
The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the
Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ
ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ
oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ
cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ
$3)135 51052+5 4amp-5 5 (5
13
UacuteѱKŏUdԛ ԛ
ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ
CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ
oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
Cooperativeneuralnetworkforstereo
~ 1979 T Poggio and D Marr MPI Tuebingen
)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz
1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg
nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2
sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2
0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2
3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2
6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2
sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2
=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2
U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2
$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2
$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2
$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2
$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2
$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2
$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2
$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2
$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2
$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2
sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2
$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2
sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2
sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2
13
13
4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z
PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2
13
+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz
D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz
0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2
3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2
6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2
sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2
)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2
sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2
sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2
$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2
sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2
sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2
0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2
3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2
6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2
sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2
=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2
Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2
pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz
UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z
Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2
=sup2
Cooperative Computation of Stereo Disparity
D Marr T Poggio
Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287
Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1
Science is currently published by American Association for the Advancement of Science
Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use
Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission
JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg
httpwwwjstororgMon Jan 22 124953 2007
Vision A Computational Investigation into the Human Representation and Processing of Visual Information
Foreword by Afterword by Tomaso Poggio
David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists
In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level
Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui
Visionwhatiswhere
A complex system must be understood at several different levels
Werner Reichardtrsquos scientific legacy Integrative Neuroscience
bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels
mdash computation mdash algorithms mdash biophysics and circuits
bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip
bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip
bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik
MIT (1981-)
43rd Stated Meeting of the NRP Associates March 14-17 1982
Learning theory + algorithms
Computational Neuroscience
models+experiments
ENGINEERING APPLICATIONS
bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor
How visual cortex works ndash and how it may suggest better computer vision
systems
2
1
1min ( ( ))i i Kf H i
V y f x fmicroisin
=
⎡ ⎤+⎢ ⎥
⎣ ⎦sum
Predictive regularization algorithms
Theorems on foundations of learning
MIT (1981-)
BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001
ON THE MATHEMATICAL FOUNDATIONS OF LEARNING
FELIPE CUCKER AND STEVE SMALE
The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial
T Poggio and CR Shelton
Introduction
(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear
We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of
languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-
ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])
(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice
Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)
Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In
Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University
grant No 8780043
c2001 American Mathematical Society
1
General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2
1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA
Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering
One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label
In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data
In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses
What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-
ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the
algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate
Box 1Formal definitions in supervised learning
Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example
n1lim jXn 2Xj 0 in probability) if and only if for every e 0
n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z
S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as
I$f
zVf zdmz
which is also the expected error of a new sample z drawn from thedistribution In the case of square loss
I$f
XYfx2 y2dmxy
We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S
IS$f 1
n
X
n
i1
Vf zi
Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m
n1lim jI$fS2 IS$fSj 0 in probability
An algorithm is (universally) consistent if uniformly for any distributionm and any e 0
n1lim P I$fSf2Hinf I$famp 1
0
letters to nature
NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group
Why do hierarchical architectures work
bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern
Sung amp Poggio 1995
~15 year old CBCL computer vision research face detection
since 2006 on the market (digital cameras)
Third Annual NSF Site Visit June 8 ndash 9 2016
Moore-like law for ML (1995-2018)
bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses
Visionwhatiswhere
bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)
ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex
Van Essen amp Anderson 1990
The ventral stream hierarchy V1 V2 V4 IT
A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to
position and scale changes
Kobatake amp Tanaka 1994
Visionventralstream
74
Cognition in people
Shape representation in the inferior temporalcortex of monkeys
Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during
view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view
Current Biology 1995 5552-563
Background
Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object
Most theories which postulate that transformations of animage representation precede matching assume either a
complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set
Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer
Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu
Current Biology 1995 Vol 5 No 5552
9520 spring 2003
Modelrsquos early predictions neurons become view-tuned during recognition
Poggio Edelman Riesenhuber (1990 2000)
Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995
A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)
Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007
Database collected by Oliva amp Torralba
Psychophysics of rapid categorization
Rapid categorization task (with mask to test feedforward model)
Animal present or not
30 ms ISI
20 ms
Image
Interval Image-Mask
Mask 1f noise
80 ms
Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005
Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)
Hierarchical feedforward models of the ventral stream
Decoding the neural code Matrix-like read-out from the brain
Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale
Hung Kreiman Poggio DiCarlo 2005
Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005
helliphelliphellip in 2013helliphellip
Cognitive theory of basic fly instincts predicts trajectory of chasing fly hellip
Wehrhahn C T Poggio and H Buumllthoff Biological Cybernetics 45 123-130 1982
30
Cognition in flies
Geiger G and T Poggio The Muller-Lyer Figure and the Fly Science 190 479-480 1975
Work at 3 levels
bull Fixation and tracking behavior of the fly (cognition in the flyhellipsimilar to Bayesian approach to cognition in humanshellipno neurons)
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits
bull Biophysics of computation
Motion algorithm the beetle Clorophanus and Reichardtrsquos motion detector
Motion algorithm the beetle and the fly
bull The beetle follows the motion
bull Each photoreceptor sees only an alternation of dark and light how is motion computed
bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector
bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz
bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex
bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)
Relative motion and figure-ground discrimination the fly
Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)
Motion discontinuities and figure-ground discrimination neural circuitry
Towards the neural circuitry Reichardt Poggio Hausen 1983
36
Relative motion
Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003
Two of the neuronshellip
Work at 3 levels
bull Fixation and tracking behavior of the fly
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)
bull Biophysics of computation
39
Biophysics of computation (motion detection)
Biophysics of Computation
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
copy Nature Publishing Group1985
_____________________________________ ____________
Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch
Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA
Istituto di Fisica Universita di Genova Genova Italy
Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain
COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision
Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction
A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems
Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only
Examples of early vision processes
bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour
generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing
Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process
A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems
Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6
one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions
The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the
Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ
ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ
oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ
cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ
$3)135 51052+5 4amp-5 5 (5
13
UacuteѱKŏUdԛ ԛ
ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ
CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ
oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
Cooperativeneuralnetworkforstereo
~ 1979 T Poggio and D Marr MPI Tuebingen
)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz
1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg
nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2
sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2
0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2
3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2
6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2
sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2
=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2
U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2
$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2
$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2
$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2
$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2
$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2
$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2
$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2
$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2
$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2
sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2
$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2
sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2
sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2
13
13
4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z
PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2
13
+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz
D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz
0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2
3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2
6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2
sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2
)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2
sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2
sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2
$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2
sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2
sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2
0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2
3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2
6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2
sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2
=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2
Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2
pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz
UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z
Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2
=sup2
Cooperative Computation of Stereo Disparity
D Marr T Poggio
Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287
Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1
Science is currently published by American Association for the Advancement of Science
Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use
Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission
JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg
httpwwwjstororgMon Jan 22 124953 2007
Vision A Computational Investigation into the Human Representation and Processing of Visual Information
Foreword by Afterword by Tomaso Poggio
David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists
In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level
Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui
Visionwhatiswhere
A complex system must be understood at several different levels
Werner Reichardtrsquos scientific legacy Integrative Neuroscience
bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels
mdash computation mdash algorithms mdash biophysics and circuits
bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip
bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip
bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik
MIT (1981-)
43rd Stated Meeting of the NRP Associates March 14-17 1982
Learning theory + algorithms
Computational Neuroscience
models+experiments
ENGINEERING APPLICATIONS
bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor
How visual cortex works ndash and how it may suggest better computer vision
systems
2
1
1min ( ( ))i i Kf H i
V y f x fmicroisin
=
⎡ ⎤+⎢ ⎥
⎣ ⎦sum
Predictive regularization algorithms
Theorems on foundations of learning
MIT (1981-)
BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001
ON THE MATHEMATICAL FOUNDATIONS OF LEARNING
FELIPE CUCKER AND STEVE SMALE
The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial
T Poggio and CR Shelton
Introduction
(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear
We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of
languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-
ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])
(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice
Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)
Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In
Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University
grant No 8780043
c2001 American Mathematical Society
1
General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2
1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA
Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering
One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label
In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data
In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses
What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-
ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the
algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate
Box 1Formal definitions in supervised learning
Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example
n1lim jXn 2Xj 0 in probability) if and only if for every e 0
n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z
S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as
I$f
zVf zdmz
which is also the expected error of a new sample z drawn from thedistribution In the case of square loss
I$f
XYfx2 y2dmxy
We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S
IS$f 1
n
X
n
i1
Vf zi
Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m
n1lim jI$fS2 IS$fSj 0 in probability
An algorithm is (universally) consistent if uniformly for any distributionm and any e 0
n1lim P I$fSf2Hinf I$famp 1
0
letters to nature
NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group
Why do hierarchical architectures work
bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern
Sung amp Poggio 1995
~15 year old CBCL computer vision research face detection
since 2006 on the market (digital cameras)
Third Annual NSF Site Visit June 8 ndash 9 2016
Moore-like law for ML (1995-2018)
bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses
Visionwhatiswhere
bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)
ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex
Van Essen amp Anderson 1990
The ventral stream hierarchy V1 V2 V4 IT
A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to
position and scale changes
Kobatake amp Tanaka 1994
Visionventralstream
74
Cognition in people
Shape representation in the inferior temporalcortex of monkeys
Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during
view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view
Current Biology 1995 5552-563
Background
Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object
Most theories which postulate that transformations of animage representation precede matching assume either a
complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set
Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer
Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu
Current Biology 1995 Vol 5 No 5552
9520 spring 2003
Modelrsquos early predictions neurons become view-tuned during recognition
Poggio Edelman Riesenhuber (1990 2000)
Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995
A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)
Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007
Database collected by Oliva amp Torralba
Psychophysics of rapid categorization
Rapid categorization task (with mask to test feedforward model)
Animal present or not
30 ms ISI
20 ms
Image
Interval Image-Mask
Mask 1f noise
80 ms
Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005
Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)
Hierarchical feedforward models of the ventral stream
Decoding the neural code Matrix-like read-out from the brain
Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale
Hung Kreiman Poggio DiCarlo 2005
Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005
helliphelliphellip in 2013helliphellip
30
Cognition in flies
Geiger G and T Poggio The Muller-Lyer Figure and the Fly Science 190 479-480 1975
Work at 3 levels
bull Fixation and tracking behavior of the fly (cognition in the flyhellipsimilar to Bayesian approach to cognition in humanshellipno neurons)
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits
bull Biophysics of computation
Motion algorithm the beetle Clorophanus and Reichardtrsquos motion detector
Motion algorithm the beetle and the fly
bull The beetle follows the motion
bull Each photoreceptor sees only an alternation of dark and light how is motion computed
bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector
bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz
bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex
bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)
Relative motion and figure-ground discrimination the fly
Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)
Motion discontinuities and figure-ground discrimination neural circuitry
Towards the neural circuitry Reichardt Poggio Hausen 1983
36
Relative motion
Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003
Two of the neuronshellip
Work at 3 levels
bull Fixation and tracking behavior of the fly
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)
bull Biophysics of computation
39
Biophysics of computation (motion detection)
Biophysics of Computation
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
copy Nature Publishing Group1985
_____________________________________ ____________
Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch
Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA
Istituto di Fisica Universita di Genova Genova Italy
Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain
COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision
Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction
A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems
Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only
Examples of early vision processes
bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour
generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing
Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process
A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems
Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6
one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions
The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the
Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ
ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ
oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ
cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ
$3)135 51052+5 4amp-5 5 (5
13
UacuteѱKŏUdԛ ԛ
ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ
CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ
oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
Cooperativeneuralnetworkforstereo
~ 1979 T Poggio and D Marr MPI Tuebingen
)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz
1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg
nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2
sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2
0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2
3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2
6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2
sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2
=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2
U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2
$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2
$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2
$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2
$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2
$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2
$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2
$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2
$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2
$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2
sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2
$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2
sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2
sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2
13
13
4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z
PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2
13
+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz
D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz
0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2
3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2
6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2
sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2
)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2
sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2
sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2
$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2
sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2
sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2
0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2
3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2
6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2
sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2
=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2
Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2
pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz
UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z
Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2
=sup2
Cooperative Computation of Stereo Disparity
D Marr T Poggio
Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287
Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1
Science is currently published by American Association for the Advancement of Science
Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use
Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission
JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg
httpwwwjstororgMon Jan 22 124953 2007
Vision A Computational Investigation into the Human Representation and Processing of Visual Information
Foreword by Afterword by Tomaso Poggio
David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists
In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level
Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui
Visionwhatiswhere
A complex system must be understood at several different levels
Werner Reichardtrsquos scientific legacy Integrative Neuroscience
bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels
mdash computation mdash algorithms mdash biophysics and circuits
bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip
bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip
bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik
MIT (1981-)
43rd Stated Meeting of the NRP Associates March 14-17 1982
Learning theory + algorithms
Computational Neuroscience
models+experiments
ENGINEERING APPLICATIONS
bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor
How visual cortex works ndash and how it may suggest better computer vision
systems
2
1
1min ( ( ))i i Kf H i
V y f x fmicroisin
=
⎡ ⎤+⎢ ⎥
⎣ ⎦sum
Predictive regularization algorithms
Theorems on foundations of learning
MIT (1981-)
BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001
ON THE MATHEMATICAL FOUNDATIONS OF LEARNING
FELIPE CUCKER AND STEVE SMALE
The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial
T Poggio and CR Shelton
Introduction
(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear
We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of
languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-
ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])
(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice
Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)
Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In
Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University
grant No 8780043
c2001 American Mathematical Society
1
General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2
1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA
Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering
One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label
In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data
In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses
What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-
ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the
algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate
Box 1Formal definitions in supervised learning
Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example
n1lim jXn 2Xj 0 in probability) if and only if for every e 0
n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z
S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as
I$f
zVf zdmz
which is also the expected error of a new sample z drawn from thedistribution In the case of square loss
I$f
XYfx2 y2dmxy
We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S
IS$f 1
n
X
n
i1
Vf zi
Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m
n1lim jI$fS2 IS$fSj 0 in probability
An algorithm is (universally) consistent if uniformly for any distributionm and any e 0
n1lim P I$fSf2Hinf I$famp 1
0
letters to nature
NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group
Why do hierarchical architectures work
bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern
Sung amp Poggio 1995
~15 year old CBCL computer vision research face detection
since 2006 on the market (digital cameras)
Third Annual NSF Site Visit June 8 ndash 9 2016
Moore-like law for ML (1995-2018)
bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses
Visionwhatiswhere
bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)
ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex
Van Essen amp Anderson 1990
The ventral stream hierarchy V1 V2 V4 IT
A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to
position and scale changes
Kobatake amp Tanaka 1994
Visionventralstream
74
Cognition in people
Shape representation in the inferior temporalcortex of monkeys
Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during
view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view
Current Biology 1995 5552-563
Background
Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object
Most theories which postulate that transformations of animage representation precede matching assume either a
complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set
Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer
Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu
Current Biology 1995 Vol 5 No 5552
9520 spring 2003
Modelrsquos early predictions neurons become view-tuned during recognition
Poggio Edelman Riesenhuber (1990 2000)
Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995
A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)
Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007
Database collected by Oliva amp Torralba
Psychophysics of rapid categorization
Rapid categorization task (with mask to test feedforward model)
Animal present or not
30 ms ISI
20 ms
Image
Interval Image-Mask
Mask 1f noise
80 ms
Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005
Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)
Hierarchical feedforward models of the ventral stream
Decoding the neural code Matrix-like read-out from the brain
Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale
Hung Kreiman Poggio DiCarlo 2005
Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005
helliphelliphellip in 2013helliphellip
Work at 3 levels
bull Fixation and tracking behavior of the fly (cognition in the flyhellipsimilar to Bayesian approach to cognition in humanshellipno neurons)
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits
bull Biophysics of computation
Motion algorithm the beetle Clorophanus and Reichardtrsquos motion detector
Motion algorithm the beetle and the fly
bull The beetle follows the motion
bull Each photoreceptor sees only an alternation of dark and light how is motion computed
bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector
bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz
bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex
bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)
Relative motion and figure-ground discrimination the fly
Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)
Motion discontinuities and figure-ground discrimination neural circuitry
Towards the neural circuitry Reichardt Poggio Hausen 1983
36
Relative motion
Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003
Two of the neuronshellip
Work at 3 levels
bull Fixation and tracking behavior of the fly
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)
bull Biophysics of computation
39
Biophysics of computation (motion detection)
Biophysics of Computation
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
copy Nature Publishing Group1985
_____________________________________ ____________
Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch
Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA
Istituto di Fisica Universita di Genova Genova Italy
Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain
COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision
Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction
A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems
Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only
Examples of early vision processes
bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour
generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing
Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process
A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems
Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6
one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions
The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the
Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ
ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ
oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ
cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ
$3)135 51052+5 4amp-5 5 (5
13
UacuteѱKŏUdԛ ԛ
ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ
CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ
oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
Cooperativeneuralnetworkforstereo
~ 1979 T Poggio and D Marr MPI Tuebingen
)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz
1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg
nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2
sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2
0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2
3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2
6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2
sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2
=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2
U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2
$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2
$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2
$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2
$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2
$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2
$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2
$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2
$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2
$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2
sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2
$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2
sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2
sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2
13
13
4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z
PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2
13
+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz
D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz
0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2
3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2
6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2
sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2
)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2
sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2
sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2
$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2
sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2
sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2
0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2
3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2
6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2
sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2
=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2
Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2
pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz
UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z
Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2
=sup2
Cooperative Computation of Stereo Disparity
D Marr T Poggio
Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287
Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1
Science is currently published by American Association for the Advancement of Science
Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use
Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission
JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg
httpwwwjstororgMon Jan 22 124953 2007
Vision A Computational Investigation into the Human Representation and Processing of Visual Information
Foreword by Afterword by Tomaso Poggio
David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists
In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level
Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui
Visionwhatiswhere
A complex system must be understood at several different levels
Werner Reichardtrsquos scientific legacy Integrative Neuroscience
bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels
mdash computation mdash algorithms mdash biophysics and circuits
bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip
bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip
bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik
MIT (1981-)
43rd Stated Meeting of the NRP Associates March 14-17 1982
Learning theory + algorithms
Computational Neuroscience
models+experiments
ENGINEERING APPLICATIONS
bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor
How visual cortex works ndash and how it may suggest better computer vision
systems
2
1
1min ( ( ))i i Kf H i
V y f x fmicroisin
=
⎡ ⎤+⎢ ⎥
⎣ ⎦sum
Predictive regularization algorithms
Theorems on foundations of learning
MIT (1981-)
BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001
ON THE MATHEMATICAL FOUNDATIONS OF LEARNING
FELIPE CUCKER AND STEVE SMALE
The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial
T Poggio and CR Shelton
Introduction
(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear
We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of
languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-
ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])
(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice
Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)
Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In
Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University
grant No 8780043
c2001 American Mathematical Society
1
General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2
1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA
Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering
One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label
In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data
In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses
What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-
ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the
algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate
Box 1Formal definitions in supervised learning
Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example
n1lim jXn 2Xj 0 in probability) if and only if for every e 0
n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z
S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as
I$f
zVf zdmz
which is also the expected error of a new sample z drawn from thedistribution In the case of square loss
I$f
XYfx2 y2dmxy
We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S
IS$f 1
n
X
n
i1
Vf zi
Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m
n1lim jI$fS2 IS$fSj 0 in probability
An algorithm is (universally) consistent if uniformly for any distributionm and any e 0
n1lim P I$fSf2Hinf I$famp 1
0
letters to nature
NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group
Why do hierarchical architectures work
bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern
Sung amp Poggio 1995
~15 year old CBCL computer vision research face detection
since 2006 on the market (digital cameras)
Third Annual NSF Site Visit June 8 ndash 9 2016
Moore-like law for ML (1995-2018)
bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses
Visionwhatiswhere
bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)
ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex
Van Essen amp Anderson 1990
The ventral stream hierarchy V1 V2 V4 IT
A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to
position and scale changes
Kobatake amp Tanaka 1994
Visionventralstream
74
Cognition in people
Shape representation in the inferior temporalcortex of monkeys
Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during
view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view
Current Biology 1995 5552-563
Background
Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object
Most theories which postulate that transformations of animage representation precede matching assume either a
complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set
Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer
Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu
Current Biology 1995 Vol 5 No 5552
9520 spring 2003
Modelrsquos early predictions neurons become view-tuned during recognition
Poggio Edelman Riesenhuber (1990 2000)
Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995
A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)
Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007
Database collected by Oliva amp Torralba
Psychophysics of rapid categorization
Rapid categorization task (with mask to test feedforward model)
Animal present or not
30 ms ISI
20 ms
Image
Interval Image-Mask
Mask 1f noise
80 ms
Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005
Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)
Hierarchical feedforward models of the ventral stream
Decoding the neural code Matrix-like read-out from the brain
Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale
Hung Kreiman Poggio DiCarlo 2005
Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005
helliphelliphellip in 2013helliphellip
Motion algorithm the beetle Clorophanus and Reichardtrsquos motion detector
Motion algorithm the beetle and the fly
bull The beetle follows the motion
bull Each photoreceptor sees only an alternation of dark and light how is motion computed
bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector
bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz
bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex
bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)
Relative motion and figure-ground discrimination the fly
Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)
Motion discontinuities and figure-ground discrimination neural circuitry
Towards the neural circuitry Reichardt Poggio Hausen 1983
36
Relative motion
Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003
Two of the neuronshellip
Work at 3 levels
bull Fixation and tracking behavior of the fly
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)
bull Biophysics of computation
39
Biophysics of computation (motion detection)
Biophysics of Computation
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
copy Nature Publishing Group1985
_____________________________________ ____________
Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch
Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA
Istituto di Fisica Universita di Genova Genova Italy
Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain
COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision
Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction
A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems
Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only
Examples of early vision processes
bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour
generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing
Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process
A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems
Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6
one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions
The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the
Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ
ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ
oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ
cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ
$3)135 51052+5 4amp-5 5 (5
13
UacuteѱKŏUdԛ ԛ
ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ
CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ
oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
Cooperativeneuralnetworkforstereo
~ 1979 T Poggio and D Marr MPI Tuebingen
)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz
1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg
nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2
sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2
0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2
3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2
6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2
sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2
=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2
U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2
$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2
$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2
$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2
$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2
$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2
$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2
$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2
$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2
$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2
sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2
$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2
sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2
sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2
13
13
4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z
PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2
13
+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz
D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz
0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2
3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2
6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2
sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2
)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2
sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2
sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2
$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2
sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2
sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2
0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2
3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2
6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2
sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2
=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2
Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2
pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz
UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z
Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2
=sup2
Cooperative Computation of Stereo Disparity
D Marr T Poggio
Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287
Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1
Science is currently published by American Association for the Advancement of Science
Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use
Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission
JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg
httpwwwjstororgMon Jan 22 124953 2007
Vision A Computational Investigation into the Human Representation and Processing of Visual Information
Foreword by Afterword by Tomaso Poggio
David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists
In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level
Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui
Visionwhatiswhere
A complex system must be understood at several different levels
Werner Reichardtrsquos scientific legacy Integrative Neuroscience
bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels
mdash computation mdash algorithms mdash biophysics and circuits
bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip
bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip
bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik
MIT (1981-)
43rd Stated Meeting of the NRP Associates March 14-17 1982
Learning theory + algorithms
Computational Neuroscience
models+experiments
ENGINEERING APPLICATIONS
bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor
How visual cortex works ndash and how it may suggest better computer vision
systems
2
1
1min ( ( ))i i Kf H i
V y f x fmicroisin
=
⎡ ⎤+⎢ ⎥
⎣ ⎦sum
Predictive regularization algorithms
Theorems on foundations of learning
MIT (1981-)
BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001
ON THE MATHEMATICAL FOUNDATIONS OF LEARNING
FELIPE CUCKER AND STEVE SMALE
The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial
T Poggio and CR Shelton
Introduction
(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear
We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of
languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-
ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])
(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice
Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)
Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In
Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University
grant No 8780043
c2001 American Mathematical Society
1
General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2
1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA
Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering
One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label
In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data
In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses
What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-
ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the
algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate
Box 1Formal definitions in supervised learning
Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example
n1lim jXn 2Xj 0 in probability) if and only if for every e 0
n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z
S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as
I$f
zVf zdmz
which is also the expected error of a new sample z drawn from thedistribution In the case of square loss
I$f
XYfx2 y2dmxy
We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S
IS$f 1
n
X
n
i1
Vf zi
Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m
n1lim jI$fS2 IS$fSj 0 in probability
An algorithm is (universally) consistent if uniformly for any distributionm and any e 0
n1lim P I$fSf2Hinf I$famp 1
0
letters to nature
NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group
Why do hierarchical architectures work
bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern
Sung amp Poggio 1995
~15 year old CBCL computer vision research face detection
since 2006 on the market (digital cameras)
Third Annual NSF Site Visit June 8 ndash 9 2016
Moore-like law for ML (1995-2018)
bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses
Visionwhatiswhere
bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)
ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex
Van Essen amp Anderson 1990
The ventral stream hierarchy V1 V2 V4 IT
A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to
position and scale changes
Kobatake amp Tanaka 1994
Visionventralstream
74
Cognition in people
Shape representation in the inferior temporalcortex of monkeys
Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during
view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view
Current Biology 1995 5552-563
Background
Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object
Most theories which postulate that transformations of animage representation precede matching assume either a
complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set
Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer
Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu
Current Biology 1995 Vol 5 No 5552
9520 spring 2003
Modelrsquos early predictions neurons become view-tuned during recognition
Poggio Edelman Riesenhuber (1990 2000)
Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995
A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)
Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007
Database collected by Oliva amp Torralba
Psychophysics of rapid categorization
Rapid categorization task (with mask to test feedforward model)
Animal present or not
30 ms ISI
20 ms
Image
Interval Image-Mask
Mask 1f noise
80 ms
Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005
Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)
Hierarchical feedforward models of the ventral stream
Decoding the neural code Matrix-like read-out from the brain
Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale
Hung Kreiman Poggio DiCarlo 2005
Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005
helliphelliphellip in 2013helliphellip
Motion algorithm the beetle and the fly
bull The beetle follows the motion
bull Each photoreceptor sees only an alternation of dark and light how is motion computed
bull Reichardt and Hassenstein (and Peter Kunze) found the rules used by neural circuits The algorithm (refined by D Varju) explained many data Reichardt detector
bull The same model describes motion perception in flies beautiful papers on anatomy optics and organization of motion perception by Braitenberg Kirschfeld Goetz
bull An equivalent (ldquoenergyrdquo) model (Adelson) describes motion cells in primate cortex
bull A form of it has been used by Matsushita in the first chips to stabilize videocameras (see also Buelthoff Little and Poggio Nature 1989)
Relative motion and figure-ground discrimination the fly
Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)
Motion discontinuities and figure-ground discrimination neural circuitry
Towards the neural circuitry Reichardt Poggio Hausen 1983
36
Relative motion
Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003
Two of the neuronshellip
Work at 3 levels
bull Fixation and tracking behavior of the fly
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)
bull Biophysics of computation
39
Biophysics of computation (motion detection)
Biophysics of Computation
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
copy Nature Publishing Group1985
_____________________________________ ____________
Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch
Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA
Istituto di Fisica Universita di Genova Genova Italy
Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain
COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision
Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction
A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems
Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only
Examples of early vision processes
bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour
generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing
Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process
A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems
Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6
one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions
The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the
Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ
ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ
oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ
cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ
$3)135 51052+5 4amp-5 5 (5
13
UacuteѱKŏUdԛ ԛ
ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ
CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ
oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
Cooperativeneuralnetworkforstereo
~ 1979 T Poggio and D Marr MPI Tuebingen
)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz
1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg
nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2
sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2
0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2
3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2
6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2
sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2
=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2
U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2
$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2
$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2
$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2
$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2
$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2
$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2
$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2
$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2
$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2
sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2
$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2
sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2
sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2
13
13
4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z
PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2
13
+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz
D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz
0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2
3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2
6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2
sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2
)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2
sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2
sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2
$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2
sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2
sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2
0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2
3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2
6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2
sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2
=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2
Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2
pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz
UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z
Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2
=sup2
Cooperative Computation of Stereo Disparity
D Marr T Poggio
Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287
Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1
Science is currently published by American Association for the Advancement of Science
Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use
Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission
JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg
httpwwwjstororgMon Jan 22 124953 2007
Vision A Computational Investigation into the Human Representation and Processing of Visual Information
Foreword by Afterword by Tomaso Poggio
David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists
In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level
Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui
Visionwhatiswhere
A complex system must be understood at several different levels
Werner Reichardtrsquos scientific legacy Integrative Neuroscience
bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels
mdash computation mdash algorithms mdash biophysics and circuits
bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip
bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip
bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik
MIT (1981-)
43rd Stated Meeting of the NRP Associates March 14-17 1982
Learning theory + algorithms
Computational Neuroscience
models+experiments
ENGINEERING APPLICATIONS
bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor
How visual cortex works ndash and how it may suggest better computer vision
systems
2
1
1min ( ( ))i i Kf H i
V y f x fmicroisin
=
⎡ ⎤+⎢ ⎥
⎣ ⎦sum
Predictive regularization algorithms
Theorems on foundations of learning
MIT (1981-)
BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001
ON THE MATHEMATICAL FOUNDATIONS OF LEARNING
FELIPE CUCKER AND STEVE SMALE
The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial
T Poggio and CR Shelton
Introduction
(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear
We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of
languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-
ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])
(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice
Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)
Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In
Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University
grant No 8780043
c2001 American Mathematical Society
1
General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2
1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA
Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering
One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label
In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data
In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses
What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-
ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the
algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate
Box 1Formal definitions in supervised learning
Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example
n1lim jXn 2Xj 0 in probability) if and only if for every e 0
n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z
S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as
I$f
zVf zdmz
which is also the expected error of a new sample z drawn from thedistribution In the case of square loss
I$f
XYfx2 y2dmxy
We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S
IS$f 1
n
X
n
i1
Vf zi
Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m
n1lim jI$fS2 IS$fSj 0 in probability
An algorithm is (universally) consistent if uniformly for any distributionm and any e 0
n1lim P I$fSf2Hinf I$famp 1
0
letters to nature
NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group
Why do hierarchical architectures work
bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern
Sung amp Poggio 1995
~15 year old CBCL computer vision research face detection
since 2006 on the market (digital cameras)
Third Annual NSF Site Visit June 8 ndash 9 2016
Moore-like law for ML (1995-2018)
bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses
Visionwhatiswhere
bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)
ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex
Van Essen amp Anderson 1990
The ventral stream hierarchy V1 V2 V4 IT
A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to
position and scale changes
Kobatake amp Tanaka 1994
Visionventralstream
74
Cognition in people
Shape representation in the inferior temporalcortex of monkeys
Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during
view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view
Current Biology 1995 5552-563
Background
Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object
Most theories which postulate that transformations of animage representation precede matching assume either a
complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set
Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer
Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu
Current Biology 1995 Vol 5 No 5552
9520 spring 2003
Modelrsquos early predictions neurons become view-tuned during recognition
Poggio Edelman Riesenhuber (1990 2000)
Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995
A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)
Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007
Database collected by Oliva amp Torralba
Psychophysics of rapid categorization
Rapid categorization task (with mask to test feedforward model)
Animal present or not
30 ms ISI
20 ms
Image
Interval Image-Mask
Mask 1f noise
80 ms
Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005
Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)
Hierarchical feedforward models of the ventral stream
Decoding the neural code Matrix-like read-out from the brain
Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale
Hung Kreiman Poggio DiCarlo 2005
Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005
helliphelliphellip in 2013helliphellip
Relative motion and figure-ground discrimination the fly
Work by Werner Reichardt (with Poggio and Hausen and later with M Egelhaaf and A Borst)
Motion discontinuities and figure-ground discrimination neural circuitry
Towards the neural circuitry Reichardt Poggio Hausen 1983
36
Relative motion
Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003
Two of the neuronshellip
Work at 3 levels
bull Fixation and tracking behavior of the fly
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)
bull Biophysics of computation
39
Biophysics of computation (motion detection)
Biophysics of Computation
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
copy Nature Publishing Group1985
_____________________________________ ____________
Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch
Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA
Istituto di Fisica Universita di Genova Genova Italy
Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain
COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision
Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction
A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems
Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only
Examples of early vision processes
bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour
generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing
Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process
A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems
Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6
one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions
The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the
Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ
ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ
oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ
cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ
$3)135 51052+5 4amp-5 5 (5
13
UacuteѱKŏUdԛ ԛ
ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ
CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ
oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
Cooperativeneuralnetworkforstereo
~ 1979 T Poggio and D Marr MPI Tuebingen
)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz
1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg
nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2
sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2
0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2
3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2
6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2
sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2
=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2
U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2
$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2
$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2
$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2
$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2
$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2
$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2
$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2
$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2
$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2
sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2
$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2
sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2
sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2
13
13
4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z
PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2
13
+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz
D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz
0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2
3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2
6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2
sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2
)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2
sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2
sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2
$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2
sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2
sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2
0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2
3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2
6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2
sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2
=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2
Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2
pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz
UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z
Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2
=sup2
Cooperative Computation of Stereo Disparity
D Marr T Poggio
Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287
Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1
Science is currently published by American Association for the Advancement of Science
Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use
Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission
JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg
httpwwwjstororgMon Jan 22 124953 2007
Vision A Computational Investigation into the Human Representation and Processing of Visual Information
Foreword by Afterword by Tomaso Poggio
David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists
In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level
Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui
Visionwhatiswhere
A complex system must be understood at several different levels
Werner Reichardtrsquos scientific legacy Integrative Neuroscience
bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels
mdash computation mdash algorithms mdash biophysics and circuits
bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip
bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip
bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik
MIT (1981-)
43rd Stated Meeting of the NRP Associates March 14-17 1982
Learning theory + algorithms
Computational Neuroscience
models+experiments
ENGINEERING APPLICATIONS
bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor
How visual cortex works ndash and how it may suggest better computer vision
systems
2
1
1min ( ( ))i i Kf H i
V y f x fmicroisin
=
⎡ ⎤+⎢ ⎥
⎣ ⎦sum
Predictive regularization algorithms
Theorems on foundations of learning
MIT (1981-)
BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001
ON THE MATHEMATICAL FOUNDATIONS OF LEARNING
FELIPE CUCKER AND STEVE SMALE
The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial
T Poggio and CR Shelton
Introduction
(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear
We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of
languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-
ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])
(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice
Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)
Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In
Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University
grant No 8780043
c2001 American Mathematical Society
1
General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2
1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA
Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering
One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label
In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data
In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses
What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-
ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the
algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate
Box 1Formal definitions in supervised learning
Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example
n1lim jXn 2Xj 0 in probability) if and only if for every e 0
n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z
S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as
I$f
zVf zdmz
which is also the expected error of a new sample z drawn from thedistribution In the case of square loss
I$f
XYfx2 y2dmxy
We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S
IS$f 1
n
X
n
i1
Vf zi
Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m
n1lim jI$fS2 IS$fSj 0 in probability
An algorithm is (universally) consistent if uniformly for any distributionm and any e 0
n1lim P I$fSf2Hinf I$famp 1
0
letters to nature
NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group
Why do hierarchical architectures work
bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern
Sung amp Poggio 1995
~15 year old CBCL computer vision research face detection
since 2006 on the market (digital cameras)
Third Annual NSF Site Visit June 8 ndash 9 2016
Moore-like law for ML (1995-2018)
bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses
Visionwhatiswhere
bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)
ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex
Van Essen amp Anderson 1990
The ventral stream hierarchy V1 V2 V4 IT
A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to
position and scale changes
Kobatake amp Tanaka 1994
Visionventralstream
74
Cognition in people
Shape representation in the inferior temporalcortex of monkeys
Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during
view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view
Current Biology 1995 5552-563
Background
Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object
Most theories which postulate that transformations of animage representation precede matching assume either a
complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set
Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer
Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu
Current Biology 1995 Vol 5 No 5552
9520 spring 2003
Modelrsquos early predictions neurons become view-tuned during recognition
Poggio Edelman Riesenhuber (1990 2000)
Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995
A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)
Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007
Database collected by Oliva amp Torralba
Psychophysics of rapid categorization
Rapid categorization task (with mask to test feedforward model)
Animal present or not
30 ms ISI
20 ms
Image
Interval Image-Mask
Mask 1f noise
80 ms
Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005
Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)
Hierarchical feedforward models of the ventral stream
Decoding the neural code Matrix-like read-out from the brain
Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale
Hung Kreiman Poggio DiCarlo 2005
Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005
helliphelliphellip in 2013helliphellip
Motion discontinuities and figure-ground discrimination neural circuitry
Towards the neural circuitry Reichardt Poggio Hausen 1983
36
Relative motion
Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003
Two of the neuronshellip
Work at 3 levels
bull Fixation and tracking behavior of the fly
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)
bull Biophysics of computation
39
Biophysics of computation (motion detection)
Biophysics of Computation
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
copy Nature Publishing Group1985
_____________________________________ ____________
Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch
Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA
Istituto di Fisica Universita di Genova Genova Italy
Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain
COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision
Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction
A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems
Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only
Examples of early vision processes
bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour
generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing
Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process
A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems
Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6
one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions
The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the
Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ
ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ
oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ
cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ
$3)135 51052+5 4amp-5 5 (5
13
UacuteѱKŏUdԛ ԛ
ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ
CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ
oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
Cooperativeneuralnetworkforstereo
~ 1979 T Poggio and D Marr MPI Tuebingen
)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz
1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg
nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2
sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2
0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2
3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2
6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2
sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2
=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2
U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2
$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2
$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2
$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2
$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2
$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2
$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2
$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2
$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2
$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2
sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2
$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2
sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2
sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2
13
13
4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z
PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2
13
+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz
D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz
0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2
3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2
6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2
sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2
)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2
sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2
sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2
$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2
sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2
sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2
0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2
3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2
6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2
sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2
=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2
Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2
pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz
UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z
Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2
=sup2
Cooperative Computation of Stereo Disparity
D Marr T Poggio
Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287
Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1
Science is currently published by American Association for the Advancement of Science
Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use
Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission
JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg
httpwwwjstororgMon Jan 22 124953 2007
Vision A Computational Investigation into the Human Representation and Processing of Visual Information
Foreword by Afterword by Tomaso Poggio
David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists
In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level
Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui
Visionwhatiswhere
A complex system must be understood at several different levels
Werner Reichardtrsquos scientific legacy Integrative Neuroscience
bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels
mdash computation mdash algorithms mdash biophysics and circuits
bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip
bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip
bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik
MIT (1981-)
43rd Stated Meeting of the NRP Associates March 14-17 1982
Learning theory + algorithms
Computational Neuroscience
models+experiments
ENGINEERING APPLICATIONS
bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor
How visual cortex works ndash and how it may suggest better computer vision
systems
2
1
1min ( ( ))i i Kf H i
V y f x fmicroisin
=
⎡ ⎤+⎢ ⎥
⎣ ⎦sum
Predictive regularization algorithms
Theorems on foundations of learning
MIT (1981-)
BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001
ON THE MATHEMATICAL FOUNDATIONS OF LEARNING
FELIPE CUCKER AND STEVE SMALE
The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial
T Poggio and CR Shelton
Introduction
(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear
We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of
languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-
ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])
(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice
Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)
Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In
Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University
grant No 8780043
c2001 American Mathematical Society
1
General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2
1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA
Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering
One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label
In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data
In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses
What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-
ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the
algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate
Box 1Formal definitions in supervised learning
Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example
n1lim jXn 2Xj 0 in probability) if and only if for every e 0
n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z
S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as
I$f
zVf zdmz
which is also the expected error of a new sample z drawn from thedistribution In the case of square loss
I$f
XYfx2 y2dmxy
We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S
IS$f 1
n
X
n
i1
Vf zi
Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m
n1lim jI$fS2 IS$fSj 0 in probability
An algorithm is (universally) consistent if uniformly for any distributionm and any e 0
n1lim P I$fSf2Hinf I$famp 1
0
letters to nature
NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group
Why do hierarchical architectures work
bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern
Sung amp Poggio 1995
~15 year old CBCL computer vision research face detection
since 2006 on the market (digital cameras)
Third Annual NSF Site Visit June 8 ndash 9 2016
Moore-like law for ML (1995-2018)
bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses
Visionwhatiswhere
bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)
ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex
Van Essen amp Anderson 1990
The ventral stream hierarchy V1 V2 V4 IT
A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to
position and scale changes
Kobatake amp Tanaka 1994
Visionventralstream
74
Cognition in people
Shape representation in the inferior temporalcortex of monkeys
Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during
view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view
Current Biology 1995 5552-563
Background
Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object
Most theories which postulate that transformations of animage representation precede matching assume either a
complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set
Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer
Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu
Current Biology 1995 Vol 5 No 5552
9520 spring 2003
Modelrsquos early predictions neurons become view-tuned during recognition
Poggio Edelman Riesenhuber (1990 2000)
Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995
A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)
Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007
Database collected by Oliva amp Torralba
Psychophysics of rapid categorization
Rapid categorization task (with mask to test feedforward model)
Animal present or not
30 ms ISI
20 ms
Image
Interval Image-Mask
Mask 1f noise
80 ms
Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005
Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)
Hierarchical feedforward models of the ventral stream
Decoding the neural code Matrix-like read-out from the brain
Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale
Hung Kreiman Poggio DiCarlo 2005
Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005
helliphelliphellip in 2013helliphellip
36
Relative motion
Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003
Two of the neuronshellip
Work at 3 levels
bull Fixation and tracking behavior of the fly
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)
bull Biophysics of computation
39
Biophysics of computation (motion detection)
Biophysics of Computation
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
copy Nature Publishing Group1985
_____________________________________ ____________
Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch
Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA
Istituto di Fisica Universita di Genova Genova Italy
Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain
COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision
Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction
A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems
Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only
Examples of early vision processes
bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour
generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing
Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process
A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems
Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6
one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions
The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the
Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ
ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ
oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ
cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ
$3)135 51052+5 4amp-5 5 (5
13
UacuteѱKŏUdԛ ԛ
ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ
CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ
oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
Cooperativeneuralnetworkforstereo
~ 1979 T Poggio and D Marr MPI Tuebingen
)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz
1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg
nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2
sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2
0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2
3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2
6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2
sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2
=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2
U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2
$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2
$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2
$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2
$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2
$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2
$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2
$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2
$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2
$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2
sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2
$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2
sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2
sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2
13
13
4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z
PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2
13
+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz
D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz
0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2
3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2
6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2
sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2
)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2
sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2
sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2
$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2
sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2
sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2
0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2
3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2
6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2
sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2
=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2
Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2
pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz
UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z
Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2
=sup2
Cooperative Computation of Stereo Disparity
D Marr T Poggio
Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287
Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1
Science is currently published by American Association for the Advancement of Science
Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use
Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission
JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg
httpwwwjstororgMon Jan 22 124953 2007
Vision A Computational Investigation into the Human Representation and Processing of Visual Information
Foreword by Afterword by Tomaso Poggio
David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists
In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level
Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui
Visionwhatiswhere
A complex system must be understood at several different levels
Werner Reichardtrsquos scientific legacy Integrative Neuroscience
bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels
mdash computation mdash algorithms mdash biophysics and circuits
bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip
bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip
bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik
MIT (1981-)
43rd Stated Meeting of the NRP Associates March 14-17 1982
Learning theory + algorithms
Computational Neuroscience
models+experiments
ENGINEERING APPLICATIONS
bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor
How visual cortex works ndash and how it may suggest better computer vision
systems
2
1
1min ( ( ))i i Kf H i
V y f x fmicroisin
=
⎡ ⎤+⎢ ⎥
⎣ ⎦sum
Predictive regularization algorithms
Theorems on foundations of learning
MIT (1981-)
BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001
ON THE MATHEMATICAL FOUNDATIONS OF LEARNING
FELIPE CUCKER AND STEVE SMALE
The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial
T Poggio and CR Shelton
Introduction
(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear
We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of
languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-
ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])
(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice
Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)
Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In
Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University
grant No 8780043
c2001 American Mathematical Society
1
General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2
1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA
Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering
One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label
In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data
In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses
What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-
ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the
algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate
Box 1Formal definitions in supervised learning
Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example
n1lim jXn 2Xj 0 in probability) if and only if for every e 0
n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z
S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as
I$f
zVf zdmz
which is also the expected error of a new sample z drawn from thedistribution In the case of square loss
I$f
XYfx2 y2dmxy
We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S
IS$f 1
n
X
n
i1
Vf zi
Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m
n1lim jI$fS2 IS$fSj 0 in probability
An algorithm is (universally) consistent if uniformly for any distributionm and any e 0
n1lim P I$fSf2Hinf I$famp 1
0
letters to nature
NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group
Why do hierarchical architectures work
bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern
Sung amp Poggio 1995
~15 year old CBCL computer vision research face detection
since 2006 on the market (digital cameras)
Third Annual NSF Site Visit June 8 ndash 9 2016
Moore-like law for ML (1995-2018)
bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses
Visionwhatiswhere
bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)
ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex
Van Essen amp Anderson 1990
The ventral stream hierarchy V1 V2 V4 IT
A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to
position and scale changes
Kobatake amp Tanaka 1994
Visionventralstream
74
Cognition in people
Shape representation in the inferior temporalcortex of monkeys
Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during
view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view
Current Biology 1995 5552-563
Background
Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object
Most theories which postulate that transformations of animage representation precede matching assume either a
complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set
Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer
Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu
Current Biology 1995 Vol 5 No 5552
9520 spring 2003
Modelrsquos early predictions neurons become view-tuned during recognition
Poggio Edelman Riesenhuber (1990 2000)
Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995
A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)
Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007
Database collected by Oliva amp Torralba
Psychophysics of rapid categorization
Rapid categorization task (with mask to test feedforward model)
Animal present or not
30 ms ISI
20 ms
Image
Interval Image-Mask
Mask 1f noise
80 ms
Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005
Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)
Hierarchical feedforward models of the ventral stream
Decoding the neural code Matrix-like read-out from the brain
Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale
Hung Kreiman Poggio DiCarlo 2005
Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005
helliphelliphellip in 2013helliphellip
Hermann Cuntz Juumlrgen Haag and Alexander Borst 2003
Two of the neuronshellip
Work at 3 levels
bull Fixation and tracking behavior of the fly
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)
bull Biophysics of computation
39
Biophysics of computation (motion detection)
Biophysics of Computation
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
copy Nature Publishing Group1985
_____________________________________ ____________
Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch
Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA
Istituto di Fisica Universita di Genova Genova Italy
Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain
COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision
Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction
A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems
Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only
Examples of early vision processes
bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour
generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing
Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process
A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems
Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6
one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions
The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the
Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ
ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ
oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ
cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ
$3)135 51052+5 4amp-5 5 (5
13
UacuteѱKŏUdԛ ԛ
ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ
CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ
oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
Cooperativeneuralnetworkforstereo
~ 1979 T Poggio and D Marr MPI Tuebingen
)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz
1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg
nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2
sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2
0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2
3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2
6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2
sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2
=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2
U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2
$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2
$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2
$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2
$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2
$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2
$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2
$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2
$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2
$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2
sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2
$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2
sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2
sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2
13
13
4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z
PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2
13
+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz
D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz
0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2
3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2
6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2
sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2
)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2
sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2
sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2
$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2
sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2
sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2
0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2
3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2
6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2
sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2
=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2
Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2
pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz
UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z
Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2
=sup2
Cooperative Computation of Stereo Disparity
D Marr T Poggio
Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287
Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1
Science is currently published by American Association for the Advancement of Science
Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use
Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission
JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg
httpwwwjstororgMon Jan 22 124953 2007
Vision A Computational Investigation into the Human Representation and Processing of Visual Information
Foreword by Afterword by Tomaso Poggio
David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists
In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level
Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui
Visionwhatiswhere
A complex system must be understood at several different levels
Werner Reichardtrsquos scientific legacy Integrative Neuroscience
bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels
mdash computation mdash algorithms mdash biophysics and circuits
bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip
bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip
bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik
MIT (1981-)
43rd Stated Meeting of the NRP Associates March 14-17 1982
Learning theory + algorithms
Computational Neuroscience
models+experiments
ENGINEERING APPLICATIONS
bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor
How visual cortex works ndash and how it may suggest better computer vision
systems
2
1
1min ( ( ))i i Kf H i
V y f x fmicroisin
=
⎡ ⎤+⎢ ⎥
⎣ ⎦sum
Predictive regularization algorithms
Theorems on foundations of learning
MIT (1981-)
BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001
ON THE MATHEMATICAL FOUNDATIONS OF LEARNING
FELIPE CUCKER AND STEVE SMALE
The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial
T Poggio and CR Shelton
Introduction
(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear
We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of
languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-
ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])
(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice
Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)
Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In
Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University
grant No 8780043
c2001 American Mathematical Society
1
General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2
1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA
Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering
One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label
In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data
In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses
What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-
ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the
algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate
Box 1Formal definitions in supervised learning
Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example
n1lim jXn 2Xj 0 in probability) if and only if for every e 0
n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z
S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as
I$f
zVf zdmz
which is also the expected error of a new sample z drawn from thedistribution In the case of square loss
I$f
XYfx2 y2dmxy
We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S
IS$f 1
n
X
n
i1
Vf zi
Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m
n1lim jI$fS2 IS$fSj 0 in probability
An algorithm is (universally) consistent if uniformly for any distributionm and any e 0
n1lim P I$fSf2Hinf I$famp 1
0
letters to nature
NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group
Why do hierarchical architectures work
bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern
Sung amp Poggio 1995
~15 year old CBCL computer vision research face detection
since 2006 on the market (digital cameras)
Third Annual NSF Site Visit June 8 ndash 9 2016
Moore-like law for ML (1995-2018)
bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses
Visionwhatiswhere
bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)
ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex
Van Essen amp Anderson 1990
The ventral stream hierarchy V1 V2 V4 IT
A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to
position and scale changes
Kobatake amp Tanaka 1994
Visionventralstream
74
Cognition in people
Shape representation in the inferior temporalcortex of monkeys
Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during
view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view
Current Biology 1995 5552-563
Background
Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object
Most theories which postulate that transformations of animage representation precede matching assume either a
complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set
Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer
Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu
Current Biology 1995 Vol 5 No 5552
9520 spring 2003
Modelrsquos early predictions neurons become view-tuned during recognition
Poggio Edelman Riesenhuber (1990 2000)
Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995
A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)
Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007
Database collected by Oliva amp Torralba
Psychophysics of rapid categorization
Rapid categorization task (with mask to test feedforward model)
Animal present or not
30 ms ISI
20 ms
Image
Interval Image-Mask
Mask 1f noise
80 ms
Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005
Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)
Hierarchical feedforward models of the ventral stream
Decoding the neural code Matrix-like read-out from the brain
Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale
Hung Kreiman Poggio DiCarlo 2005
Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005
helliphelliphellip in 2013helliphellip
Work at 3 levels
bull Fixation and tracking behavior of the fly
bull Motion algorithms and circuits the beetle (and the fly) relative motion algorithms and circuits (similar in spirit to HMAX and DiCarlo)
bull Biophysics of computation
39
Biophysics of computation (motion detection)
Biophysics of Computation
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
copy Nature Publishing Group1985
_____________________________________ ____________
Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch
Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA
Istituto di Fisica Universita di Genova Genova Italy
Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain
COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision
Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction
A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems
Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only
Examples of early vision processes
bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour
generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing
Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process
A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems
Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6
one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions
The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the
Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ
ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ
oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ
cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ
$3)135 51052+5 4amp-5 5 (5
13
UacuteѱKŏUdԛ ԛ
ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ
CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ
oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
Cooperativeneuralnetworkforstereo
~ 1979 T Poggio and D Marr MPI Tuebingen
)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz
1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg
nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2
sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2
0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2
3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2
6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2
sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2
=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2
U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2
$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2
$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2
$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2
$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2
$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2
$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2
$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2
$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2
$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2
sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2
$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2
sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2
sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2
13
13
4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z
PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2
13
+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz
D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz
0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2
3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2
6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2
sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2
)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2
sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2
sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2
$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2
sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2
sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2
0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2
3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2
6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2
sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2
=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2
Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2
pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz
UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z
Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2
=sup2
Cooperative Computation of Stereo Disparity
D Marr T Poggio
Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287
Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1
Science is currently published by American Association for the Advancement of Science
Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use
Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission
JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg
httpwwwjstororgMon Jan 22 124953 2007
Vision A Computational Investigation into the Human Representation and Processing of Visual Information
Foreword by Afterword by Tomaso Poggio
David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists
In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level
Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui
Visionwhatiswhere
A complex system must be understood at several different levels
Werner Reichardtrsquos scientific legacy Integrative Neuroscience
bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels
mdash computation mdash algorithms mdash biophysics and circuits
bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip
bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip
bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik
MIT (1981-)
43rd Stated Meeting of the NRP Associates March 14-17 1982
Learning theory + algorithms
Computational Neuroscience
models+experiments
ENGINEERING APPLICATIONS
bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor
How visual cortex works ndash and how it may suggest better computer vision
systems
2
1
1min ( ( ))i i Kf H i
V y f x fmicroisin
=
⎡ ⎤+⎢ ⎥
⎣ ⎦sum
Predictive regularization algorithms
Theorems on foundations of learning
MIT (1981-)
BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001
ON THE MATHEMATICAL FOUNDATIONS OF LEARNING
FELIPE CUCKER AND STEVE SMALE
The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial
T Poggio and CR Shelton
Introduction
(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear
We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of
languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-
ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])
(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice
Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)
Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In
Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University
grant No 8780043
c2001 American Mathematical Society
1
General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2
1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA
Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering
One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label
In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data
In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses
What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-
ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the
algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate
Box 1Formal definitions in supervised learning
Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example
n1lim jXn 2Xj 0 in probability) if and only if for every e 0
n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z
S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as
I$f
zVf zdmz
which is also the expected error of a new sample z drawn from thedistribution In the case of square loss
I$f
XYfx2 y2dmxy
We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S
IS$f 1
n
X
n
i1
Vf zi
Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m
n1lim jI$fS2 IS$fSj 0 in probability
An algorithm is (universally) consistent if uniformly for any distributionm and any e 0
n1lim P I$fSf2Hinf I$famp 1
0
letters to nature
NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group
Why do hierarchical architectures work
bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern
Sung amp Poggio 1995
~15 year old CBCL computer vision research face detection
since 2006 on the market (digital cameras)
Third Annual NSF Site Visit June 8 ndash 9 2016
Moore-like law for ML (1995-2018)
bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses
Visionwhatiswhere
bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)
ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex
Van Essen amp Anderson 1990
The ventral stream hierarchy V1 V2 V4 IT
A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to
position and scale changes
Kobatake amp Tanaka 1994
Visionventralstream
74
Cognition in people
Shape representation in the inferior temporalcortex of monkeys
Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during
view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view
Current Biology 1995 5552-563
Background
Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object
Most theories which postulate that transformations of animage representation precede matching assume either a
complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set
Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer
Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu
Current Biology 1995 Vol 5 No 5552
9520 spring 2003
Modelrsquos early predictions neurons become view-tuned during recognition
Poggio Edelman Riesenhuber (1990 2000)
Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995
A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)
Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007
Database collected by Oliva amp Torralba
Psychophysics of rapid categorization
Rapid categorization task (with mask to test feedforward model)
Animal present or not
30 ms ISI
20 ms
Image
Interval Image-Mask
Mask 1f noise
80 ms
Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005
Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)
Hierarchical feedforward models of the ventral stream
Decoding the neural code Matrix-like read-out from the brain
Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale
Hung Kreiman Poggio DiCarlo 2005
Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005
helliphelliphellip in 2013helliphellip
39
Biophysics of computation (motion detection)
Biophysics of Computation
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
copy Nature Publishing Group1985
_____________________________________ ____________
Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch
Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA
Istituto di Fisica Universita di Genova Genova Italy
Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain
COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision
Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction
A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems
Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only
Examples of early vision processes
bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour
generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing
Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process
A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems
Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6
one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions
The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the
Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ
ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ
oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ
cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ
$3)135 51052+5 4amp-5 5 (5
13
UacuteѱKŏUdԛ ԛ
ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ
CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ
oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
Cooperativeneuralnetworkforstereo
~ 1979 T Poggio and D Marr MPI Tuebingen
)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz
1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg
nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2
sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2
0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2
3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2
6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2
sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2
=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2
U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2
$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2
$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2
$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2
$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2
$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2
$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2
$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2
$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2
$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2
sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2
$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2
sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2
sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2
13
13
4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z
PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2
13
+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz
D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz
0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2
3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2
6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2
sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2
)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2
sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2
sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2
$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2
sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2
sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2
0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2
3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2
6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2
sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2
=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2
Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2
pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz
UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z
Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2
=sup2
Cooperative Computation of Stereo Disparity
D Marr T Poggio
Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287
Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1
Science is currently published by American Association for the Advancement of Science
Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use
Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission
JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg
httpwwwjstororgMon Jan 22 124953 2007
Vision A Computational Investigation into the Human Representation and Processing of Visual Information
Foreword by Afterword by Tomaso Poggio
David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists
In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level
Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui
Visionwhatiswhere
A complex system must be understood at several different levels
Werner Reichardtrsquos scientific legacy Integrative Neuroscience
bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels
mdash computation mdash algorithms mdash biophysics and circuits
bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip
bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip
bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik
MIT (1981-)
43rd Stated Meeting of the NRP Associates March 14-17 1982
Learning theory + algorithms
Computational Neuroscience
models+experiments
ENGINEERING APPLICATIONS
bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor
How visual cortex works ndash and how it may suggest better computer vision
systems
2
1
1min ( ( ))i i Kf H i
V y f x fmicroisin
=
⎡ ⎤+⎢ ⎥
⎣ ⎦sum
Predictive regularization algorithms
Theorems on foundations of learning
MIT (1981-)
BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001
ON THE MATHEMATICAL FOUNDATIONS OF LEARNING
FELIPE CUCKER AND STEVE SMALE
The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial
T Poggio and CR Shelton
Introduction
(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear
We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of
languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-
ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])
(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice
Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)
Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In
Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University
grant No 8780043
c2001 American Mathematical Society
1
General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2
1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA
Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering
One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label
In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data
In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses
What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-
ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the
algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate
Box 1Formal definitions in supervised learning
Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example
n1lim jXn 2Xj 0 in probability) if and only if for every e 0
n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z
S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as
I$f
zVf zdmz
which is also the expected error of a new sample z drawn from thedistribution In the case of square loss
I$f
XYfx2 y2dmxy
We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S
IS$f 1
n
X
n
i1
Vf zi
Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m
n1lim jI$fS2 IS$fSj 0 in probability
An algorithm is (universally) consistent if uniformly for any distributionm and any e 0
n1lim P I$fSf2Hinf I$famp 1
0
letters to nature
NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group
Why do hierarchical architectures work
bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern
Sung amp Poggio 1995
~15 year old CBCL computer vision research face detection
since 2006 on the market (digital cameras)
Third Annual NSF Site Visit June 8 ndash 9 2016
Moore-like law for ML (1995-2018)
bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses
Visionwhatiswhere
bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)
ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex
Van Essen amp Anderson 1990
The ventral stream hierarchy V1 V2 V4 IT
A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to
position and scale changes
Kobatake amp Tanaka 1994
Visionventralstream
74
Cognition in people
Shape representation in the inferior temporalcortex of monkeys
Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during
view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view
Current Biology 1995 5552-563
Background
Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object
Most theories which postulate that transformations of animage representation precede matching assume either a
complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set
Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer
Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu
Current Biology 1995 Vol 5 No 5552
9520 spring 2003
Modelrsquos early predictions neurons become view-tuned during recognition
Poggio Edelman Riesenhuber (1990 2000)
Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995
A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)
Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007
Database collected by Oliva amp Torralba
Psychophysics of rapid categorization
Rapid categorization task (with mask to test feedforward model)
Animal present or not
30 ms ISI
20 ms
Image
Interval Image-Mask
Mask 1f noise
80 ms
Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005
Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)
Hierarchical feedforward models of the ventral stream
Decoding the neural code Matrix-like read-out from the brain
Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale
Hung Kreiman Poggio DiCarlo 2005
Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005
helliphelliphellip in 2013helliphellip
Biophysics of Computation
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
copy Nature Publishing Group1985
_____________________________________ ____________
Computational vision and regularization theory Tomaso Poggio Vincent Torre amp Christof Koch
Artificial Intelligence Laboratory and Center for Biological Information Processing Massachusetts Institute of Technology 545 Technology Square Cambridge Massachusetts 02193 USA
Istituto di Fisica Universita di Genova Genova Italy
Descriptions of physical properties of visible surfaces such as their distance and the presence of edges must be recovered from the primary image data Computational vision aims to understand how such descriptions can be obtained from inherently ambiguous and noisy data A recent development in this field sees early vision as a set of ill-posed problems which can be solved by the use of regularization methods These lead to algorithms and parallel analog circuits that can solve ill-posed problems and which are suggestive of neural equivalents in the brain
COMPUTATIONAL vision denotes a new field in artificial intel-ligence centred on theoretical studies of visual information processing Its two main goals are to develop image understand-ing systems which automatically construct scene descriptions from image input data and to understand human vision
Early vision is the set of visual modules that aim to extract the physical properties of the surfaces around the viewer that is distance surface orientation and material properties (reflect-ance colour texture) Much current research has analysed pro-cesses in early vision because the inputs and the goals of the computation can be well characterized at this stage (see refs 1-4 for reviews) Several problems have been solved and several specific algorithms have been successfully developed Examples are stereomatching the computation of the optical flow structure from motion shape from shading and surface reconstruction
A new theoretical development has now emerged that unifies much of these results within a single framework The approach has its roots in the recognition of a common structure of early vision problems Problems in early vision are ill-posed requir-ing specific algorithms and parallel hardware Here we introduce a specific regularization approach and discuss its implications for computer vision and parallel computer architectures includ-ing parallel hardware that could be used by biological visual systems
Early vision processes Early vision consists of a set of processes that recover physical properties of the visible three-dimensional surfaces from the two-dimensional intensity arrays Their combined output roughly corresponds to Marrs 2-12D sketch and to Barrow and Tennenbaums intrinsic images5bull Recently it has been cus-tomary to assume that these early vision processes are general and do not require domain-dependent knowledge but only
Examples of early vision processes
bull Edge detection bull Spatio-temporal interpolation and approximation bull Computation of optical flow bull Computation of lightness and albedo bull Shape from contours bull Shape from texture bull Shape from shading bull Binocular stereo matching bull Structure from motion bull Structure from stereo bull Surface reconstruction bull Computation of surface colour
generic constraints about the physical word and the imaging stage (see box) They represent conceptually independent modules that can be studied to a first approximation in isola-tion Information from the different processes however has to be combined Furthermore different modules may interact early on Finally the processing cannot be purely bottom-up specific knowledge may trickle down to the point of influencing some of the very first steps in visual information processing
Computational theories of early vision modules typically deal with the dual issues of representation and process They must specify the form of the input and the desired output (the rep-resentation) and provide the algorithms that transform one into the other (the process) Here we focus on the issue of processes and algorithms for which we describe the unifying theoretical framework of regularization theories We do not consider the equally important problem of the primitive tokens that represent the input of each specific process
A good definition of early vision is that it is inverse optics In classical optics or in computer graphics the basic problem is to determine the images of three-dimensional objects whereas vision is confronted with the inverse problem of recovering surfaces from images As so much information is lost during the imaging process that projects the three-dimensional world into the two-dimensional images vision must often rely on natural constraints that is assumptions about the physical world to derive unambiguous output The identification and use of such constraints is a recurring theme in the analysis of specific vision problems
Two important problems in early vision are the computation of motion and the detection of sharp changes in image intensity (for detecting physical edges) They illustrate well the difficulty of the problems of early vision The computation of the two-dimensional field of velocities in the image is a critical step in several schemes for recovering the motion and the three-dimensional structure of objects Consider the problem of deter-mining the velocity vector V at each point along a smooth contour in the image Following Marr and Ullman6
one can assume that the contour corresponds to locations of significant intensity change Figure 1 shows how the local velocity vector is decomposed into a normal and a tangential component to the curve Local motion measurements provide only the normal component of velocity The tangential component remains invisible to purely local measurements (unless they refer to some discontinuous features of the contour such as a corner) The problem of estimating the full velocity field is thus in general underdetermined by the measurements that are directly available from the image The measurement of the optical flow is inherently ambiguous It can be made unique only by adding information or assumptions
The difficulties of the problem of edge detection are somewhat different Edge detection denotes the process of identifying the
Aring(aringdegouml()igraveaeligԛ Egraveccedil)sup3divide+ԛETH ԛiacutecurrenԙplusmnEacuteNtildeOgraveshyyacuteԛszligicirccopyacutemicroAgraveԛ EcircAacuteAumlregEumlyenԛ Oacuteagrave+Ocircaacutethornԛ
ƏԛbrvbarIgraveiumlsectԛ ordfampacircԛ paraampAcircOtildemacrmiddot$ԛ Iacute OumlatildeAEligethɳtimesIcircԛ UcircOslashegraveeacutecedilIumlԛ
oslashsup2ntildeԛiquestumlaumlԛUgrave$ԛ
cԛ ˻ԛԛxyUumlzԛ YacuteyumlQ0ϐԛ
$3)135 51052+5 4amp-5 5 (5
13
UacuteѱKŏUdԛ ԛ
ecircԄĀSЙƫКeԛ˼˽ŐϑāƬЛԛĂƐԛϒѲϓԚМ˾ltѳѴƭϔԛR˔ăԅԛĄԛ ˿ΰ13Жąѵԛϕ˕Ʈԛ =ԛ αϖşƯНОɴHȸԛ ɵȲϗĆӊˈԛɶԛ Ѷɇưԛ ƱϘӧӐПԛ РԆСѷƲEԛ ugraveƳԛ ƑLԛ Ѹԛ ˑͰӲfԛ ɈͱӳƴӨƵgԛ Ӵɉćѹԛ őɷͲβɊԇТɸŠĈ˖ԛ F1šɋĉɹУФԛ Ċϙ2ԛŢϚɺѺţċ˗ԛ Ȩͳϛԛ ƶϜȢԛ ŤLγѻčѼɻʹХhԛ ӵɌĎѽԛ Ʒ˙ƸƹѾďϝԈԛ ɼIȳΫϞĐѿɽ͵ԛ δВάťƺЦЧɾ7ԛ13εƻГōҀɿͶШԛ ҁɍ2ԉԛ=ζ˚Ƽiԛ đԛ ӶɎʀŦɏԛ ЩƽЪͷϟԊԛ Ϡԛ ͺԛ ŧͻηӑĒgtMЫԛ ɐƾԋԛ Ӓƒƿϡ˛ʁǀ|ԛ sup1ԛ ɑʂЬԛ θēιǁϢjԛ ͼӓӋ˭ˉǂԛ Ĕԛ ĕκλͽĖŨɒԛ Xԛ ҈ɓǃЭDŽԛ μϣŒDDžЮԛ uacutedžԛ ϤLJөʃLjӷԛ ėԛ Ӕœljϥԛ Ϳȩԛ ƓʄȶȷNJϦNj҉ԛ ŔʅνɔԌЯʆũCԛnjŪɕĘʇабkԛ вӕūɖԛ ęгԛ дԍĚξҊʈŬԛ ʉҋǍϧěŭҌʊеԛ ŕǎҍӸǏǐԛ ǑԂŮʋҎĜҏʌԛ ĝƔԛ ʍɗʎŖʏYʐlԛ ƕǒIƖϨʑҐʒůԛжοʓǓзmԛ ΄vAπӖ^иǔԛ ȹǕǖϩĞґʔȺԛ GǗŗϪğ3ԛ΅JǘĠϫҒʕǙйԛ ġƗԛ ғϬĢкʖҔҕǚwϭ18˜ģҖǛƘԛ ӪΆBҗĤȻ4ԛŰɘĥǜ˝лԛ laquoUԛ ǝĦűəԛ Ǟnԛ ӹǟԛ ƙʗмŲӗноԛ ҘɚǠԛ ʘȪΈϮħҙʙΉԛ ρέųǡпрʚȼԛ Ίς3ДŎҚʛсԛ YɛĨқԛ FĩԎԛ ŘǢԛʜσ˞ǣǤҜǥƚ~ԛ ˮԛ Όȫԛ ҝɜǦтǧԛ ǨŴɝĪʝуфԛ īŵҞԛ 5ʞҟɞǩϯԛ ӺʟҠɟʠԛ Ĭԛ ȬǪӻԛ Gʡ˯˰ˊхǫŶƛVoԛ цӘɠԛ ĭчԛ ҡ9ǬԛĮŷҢʢΎԛ QΏңǭҤʣį˟ԛ Oϰԛ шԏHİτZʤŸԛ ҥϱıщEʥъыʦΐpԛ Αԛ 13ӫǮϲԛ ьӬǯϳIJˠԛ ɡәƜϴǰƝԛ ʧ˱˲эDZŹΒƞюԛ Γϵԛ DzӭdzԛWǴźΔƟяqԛ ΕƠӚˡijҦʨȽԛ ѐΖǵԛ υ϶ΗSǶϷZԐԛ Θȭԛ ҧɢǷԛ Żʩϸżӛ]ԛ ordmJԛ ёǸԛ ŽĴђǹѓԛ ӼǺԛ ӽʪ˳˴ԛ єӜȾȿǻѕҨԛ іφǼˋȴˌԛǽԃĵTˢǾїԛӾɣϹǿԛԛřʫΙχɤԑјʬžĶˣԛſķʭљԛơϺ˵ˍȀVԛ-ԛɀӮȁԛƀΚψӝҩĸҪgt13ԛ raquoԛRĹЗʮƁ_ĺϻrԛӿȂԛԀˎ˶CԛƢʯњƂӞћќԛ ҫɥȃԛ ӟϼΛĻˤԛ T5ϽļӌˏPѝsԛ ԛ ҬɦȄϾԛ ʰω˥[ĽӍAKtԛ ӠȅϿ˦ԒʱɁԛ ƣʲЀȆƃҭʳΜԛ ўȇ˧ȈƄҮʴӯʵXԘԛʶԛ үɧȉԛ ӰҰȊŚЁȋԛ Ђ6ұʷľԛ
CcedilĿџѠŀƅ9ӡѡȌӎӏѢԛ frac14W[ҲӢҳԛNȮԛ ograve6ƆɨB7ԓuԛ ԛ
oacuteʸѣԛ ЃȍϊΝ]ԛ ƤȎѤƇЄʹśȏѥԛ ЅȐѦȑІƈɩԛ ƥΞȒԛ ԁʺҴɪʻԛ ҵɫȓԛ ҶʼȯʽƉ˨ԛ ҷȔ˩˷ːɂƊ0ԛ AtildeŜΟЇŁҸΠЈԔԛ Ʀԛ ҹɬ4ԛ ȕҺȖЉԛ ȵήЊԛ ʾΡ˪ɃʿƋł˫ԛ frac12ȰΣЋŃһˀΤԛ THORNЕίƌȗѧѨˁɄԛ aucircɭ˂Ҽń˒ȘЌԛ iexclΥD˸ș8Țbԛ Ņҽԛ Ҿțԛ ņѩѪ-ƍɮӣѫȜҿѬԛfrac34ѭӀ˃ӁӤӂȝԛ Φȱԛ ocircȞƎɯΧˬΨɅԕԛ otildeȟԛ centȠӃȡЍ`Ѯԛ ѯӥϋόΩЎӄԛ ˄Ѱԛ ύЏNӱltƧȢƨԛ ˅ԛ ώŇИԛ ŝԖԛ Ӆɰȣԛ eumlOňԛ notΪӦƩʼnӆPԛŊƪԛ ˆԛ ϏŋАӇԛ Şԗԛ ӈɱȤԛuumlɲˇӉŌ˓ȥБԛpoundM˹˺ȦɆȧԛ
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
Cooperativeneuralnetworkforstereo
~ 1979 T Poggio and D Marr MPI Tuebingen
)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz
1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg
nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2
sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2
0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2
3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2
6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2
sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2
=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2
U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2
$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2
$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2
$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2
$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2
$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2
$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2
$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2
$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2
$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2
sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2
$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2
sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2
sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2
13
13
4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z
PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2
13
+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz
D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz
0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2
3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2
6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2
sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2
)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2
sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2
sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2
$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2
sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2
sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2
0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2
3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2
6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2
sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2
=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2
Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2
pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz
UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z
Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2
=sup2
Cooperative Computation of Stereo Disparity
D Marr T Poggio
Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287
Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1
Science is currently published by American Association for the Advancement of Science
Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use
Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission
JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg
httpwwwjstororgMon Jan 22 124953 2007
Vision A Computational Investigation into the Human Representation and Processing of Visual Information
Foreword by Afterword by Tomaso Poggio
David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists
In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level
Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui
Visionwhatiswhere
A complex system must be understood at several different levels
Werner Reichardtrsquos scientific legacy Integrative Neuroscience
bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels
mdash computation mdash algorithms mdash biophysics and circuits
bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip
bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip
bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik
MIT (1981-)
43rd Stated Meeting of the NRP Associates March 14-17 1982
Learning theory + algorithms
Computational Neuroscience
models+experiments
ENGINEERING APPLICATIONS
bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor
How visual cortex works ndash and how it may suggest better computer vision
systems
2
1
1min ( ( ))i i Kf H i
V y f x fmicroisin
=
⎡ ⎤+⎢ ⎥
⎣ ⎦sum
Predictive regularization algorithms
Theorems on foundations of learning
MIT (1981-)
BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001
ON THE MATHEMATICAL FOUNDATIONS OF LEARNING
FELIPE CUCKER AND STEVE SMALE
The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial
T Poggio and CR Shelton
Introduction
(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear
We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of
languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-
ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])
(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice
Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)
Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In
Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University
grant No 8780043
c2001 American Mathematical Society
1
General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2
1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA
Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering
One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label
In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data
In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses
What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-
ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the
algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate
Box 1Formal definitions in supervised learning
Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example
n1lim jXn 2Xj 0 in probability) if and only if for every e 0
n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z
S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as
I$f
zVf zdmz
which is also the expected error of a new sample z drawn from thedistribution In the case of square loss
I$f
XYfx2 y2dmxy
We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S
IS$f 1
n
X
n
i1
Vf zi
Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m
n1lim jI$fS2 IS$fSj 0 in probability
An algorithm is (universally) consistent if uniformly for any distributionm and any e 0
n1lim P I$fSf2Hinf I$famp 1
0
letters to nature
NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group
Why do hierarchical architectures work
bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern
Sung amp Poggio 1995
~15 year old CBCL computer vision research face detection
since 2006 on the market (digital cameras)
Third Annual NSF Site Visit June 8 ndash 9 2016
Moore-like law for ML (1995-2018)
bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses
Visionwhatiswhere
bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)
ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex
Van Essen amp Anderson 1990
The ventral stream hierarchy V1 V2 V4 IT
A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to
position and scale changes
Kobatake amp Tanaka 1994
Visionventralstream
74
Cognition in people
Shape representation in the inferior temporalcortex of monkeys
Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during
view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view
Current Biology 1995 5552-563
Background
Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object
Most theories which postulate that transformations of animage representation precede matching assume either a
complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set
Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer
Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu
Current Biology 1995 Vol 5 No 5552
9520 spring 2003
Modelrsquos early predictions neurons become view-tuned during recognition
Poggio Edelman Riesenhuber (1990 2000)
Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995
A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)
Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007
Database collected by Oliva amp Torralba
Psychophysics of rapid categorization
Rapid categorization task (with mask to test feedforward model)
Animal present or not
30 ms ISI
20 ms
Image
Interval Image-Mask
Mask 1f noise
80 ms
Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005
Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)
Hierarchical feedforward models of the ventral stream
Decoding the neural code Matrix-like read-out from the brain
Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale
Hung Kreiman Poggio DiCarlo 2005
Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005
helliphelliphellip in 2013helliphellip
Proc R Soc Lond B 202 409-416 (1978) Printed in Great Britain
A synaptic mechanism possibly underlying directionalselectivity to motion
B y V T O R R E f AND T P O G G IO j
f Universita di Genova Istituto di Fisica Genoa Italyt Max-Planck-Institutfur biologische Tubingen Germany
(Communicated by B B Boycott FR8 - Received 1 February 1978)
A specific synaptic interaction is proposed as the mechanism underlying the directional selectivity to motion of several nervous cells I t is shown that the hypothesis is consistent with previous behavioural and phy-siological studies of the motion detection process
Detection of movement is one of the most basic and elementary computations performed by visual systems Hence it is not surprising tha t the mechanisms and principles underlying movement detection have been approached in various species with a variety of techniques from behavioural analysis and psychophysics to physiology Although several investigators have provided a wealth of infor-mation in the last years the early analyses of Hassenstein amp Reichardt (1956) Reichardt (1957 1961) Barlow amp Hill (1963) and Barlow amp Levick (1965) still represent the extent of our understanding of this function These studies are in many respects complementary Those of Reichardt amp Hassenstein are centred on the functional principles of movement detection as inferred from the average optomotor behaviour of a whole insect whereas Barlow amp Levick attack the problem of the neural circuitry underlying directional selectivity in the ganglion cells of a vertebrate retina
Figure 1 a and b summarize the main conclusions of the two approaches Both models postulate the existence of two types of channels (1 and 2 from two adjacent receptor regions) with different conduction properties In figure 1 a channel 1 and channel 2 are low pass filters with a short and a long time constant respectively while in figure 1 b channel 2 simply contains a delay
Perhaps the most significant contribution of Barlow amp Levick consists of the experimental recognition th a t movement detection a t the level of direction selectivity of the ganglion cells results primarily from an inhibitory mechanism th a t lsquovetoesrsquo the response to simultaneous signals from the receptors (after appropriate asymmetric delay) rather than from the detection of the conjunction of excitation from two regions (see figure 1) On the other hand the main thrust of Hassenstein amp Reichardtrsquos analysis is the demonstration tha t the interaction underlying movement detection must be nonlinear and in particular of a multi-plicative type Many experimental data suggest tha t this is indeed the functional scheme underlying movement detection in insects (Poggio amp Reichardt 1976)
14 L 409 ] Vol 202 B
Cooperativeneuralnetworkforstereo
~ 1979 T Poggio and D Marr MPI Tuebingen
)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz
1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg
nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2
sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2
0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2
3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2
6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2
sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2
=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2
U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2
$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2
$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2
$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2
$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2
$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2
$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2
$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2
$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2
$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2
sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2
$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2
sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2
sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2
13
13
4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z
PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2
13
+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz
D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz
0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2
3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2
6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2
sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2
)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2
sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2
sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2
$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2
sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2
sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2
0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2
3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2
6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2
sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2
=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2
Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2
pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz
UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z
Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2
=sup2
Cooperative Computation of Stereo Disparity
D Marr T Poggio
Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287
Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1
Science is currently published by American Association for the Advancement of Science
Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use
Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission
JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg
httpwwwjstororgMon Jan 22 124953 2007
Vision A Computational Investigation into the Human Representation and Processing of Visual Information
Foreword by Afterword by Tomaso Poggio
David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists
In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level
Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui
Visionwhatiswhere
A complex system must be understood at several different levels
Werner Reichardtrsquos scientific legacy Integrative Neuroscience
bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels
mdash computation mdash algorithms mdash biophysics and circuits
bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip
bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip
bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik
MIT (1981-)
43rd Stated Meeting of the NRP Associates March 14-17 1982
Learning theory + algorithms
Computational Neuroscience
models+experiments
ENGINEERING APPLICATIONS
bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor
How visual cortex works ndash and how it may suggest better computer vision
systems
2
1
1min ( ( ))i i Kf H i
V y f x fmicroisin
=
⎡ ⎤+⎢ ⎥
⎣ ⎦sum
Predictive regularization algorithms
Theorems on foundations of learning
MIT (1981-)
BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001
ON THE MATHEMATICAL FOUNDATIONS OF LEARNING
FELIPE CUCKER AND STEVE SMALE
The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial
T Poggio and CR Shelton
Introduction
(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear
We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of
languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-
ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])
(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice
Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)
Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In
Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University
grant No 8780043
c2001 American Mathematical Society
1
General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2
1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA
Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering
One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label
In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data
In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses
What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-
ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the
algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate
Box 1Formal definitions in supervised learning
Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example
n1lim jXn 2Xj 0 in probability) if and only if for every e 0
n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z
S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as
I$f
zVf zdmz
which is also the expected error of a new sample z drawn from thedistribution In the case of square loss
I$f
XYfx2 y2dmxy
We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S
IS$f 1
n
X
n
i1
Vf zi
Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m
n1lim jI$fS2 IS$fSj 0 in probability
An algorithm is (universally) consistent if uniformly for any distributionm and any e 0
n1lim P I$fSf2Hinf I$famp 1
0
letters to nature
NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group
Why do hierarchical architectures work
bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern
Sung amp Poggio 1995
~15 year old CBCL computer vision research face detection
since 2006 on the market (digital cameras)
Third Annual NSF Site Visit June 8 ndash 9 2016
Moore-like law for ML (1995-2018)
bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses
Visionwhatiswhere
bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)
ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex
Van Essen amp Anderson 1990
The ventral stream hierarchy V1 V2 V4 IT
A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to
position and scale changes
Kobatake amp Tanaka 1994
Visionventralstream
74
Cognition in people
Shape representation in the inferior temporalcortex of monkeys
Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during
view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view
Current Biology 1995 5552-563
Background
Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object
Most theories which postulate that transformations of animage representation precede matching assume either a
complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set
Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer
Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu
Current Biology 1995 Vol 5 No 5552
9520 spring 2003
Modelrsquos early predictions neurons become view-tuned during recognition
Poggio Edelman Riesenhuber (1990 2000)
Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995
A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)
Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007
Database collected by Oliva amp Torralba
Psychophysics of rapid categorization
Rapid categorization task (with mask to test feedforward model)
Animal present or not
30 ms ISI
20 ms
Image
Interval Image-Mask
Mask 1f noise
80 ms
Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005
Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)
Hierarchical feedforward models of the ventral stream
Decoding the neural code Matrix-like read-out from the brain
Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale
Hung Kreiman Poggio DiCarlo 2005
Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005
helliphelliphellip in 2013helliphellip
Cooperativeneuralnetworkforstereo
~ 1979 T Poggio and D Marr MPI Tuebingen
)D4z +HPHgrz gXz 0H]4gPrz R4+dj]4z +T0z4n+Pj+g4zjPg]+dgzZ]X4dd4dzpIgEzjUZ]4t414Ug41z gJR4z ]4dXPjgIXUz +T1z ]4PI+IPIgrzE+dz C]4+gPrz 5qg5U051z Xj]z NUXpP40C5z+Xjgz gE4zNIU4gIdz X7zZ]JR+]rz Z]X4dd4dzIUzE4RIeg]rz+U0z+PPI41zZErdI+Pz+T0zIXuPXCI+Pz dI4U4dz RZ]Xo4R4Ugdz IUz gE4z]5PI+IPIgrz +U0z n5]d+gIPIgrz X7zZIXd4XU1zg4ETIj4dzdEXjP1zP4+1zgXz+TzIU]4+d4zIUzgE4z4qZ4]IR4Ug+PzIVA]R+gIXUz+Xjgz+dIzIUg4]+gIXUdzIUz+gXRIz+U1zRXP4jP+]zdrdvg4Rdz
1313131313$sup2_sup2[sup2cwcurrenmacrw sup2(3KGAK(DDK(sup2sup2$9sup2sup2Jsup2Rsectwcurrensup2 yenw sup2gwGordfordfsup2 hordfw yenshysup2 cndeg
nyenGordfordfsup2T nwsup2$3Csup2mmsup2ntsup2_sup2[sup2cwcurrendegmacrw sup2sup282(ampE2$K134(GK$4C-+Kcsup2Jsup2Zwordfdegwsup2 ntsup2 Vsup2Vbrvbarwsup2 Lt sup2kwshysup2 wlaquosup2 lsup2$3sup2sup2=sup2
sup2Xsup2Hsup2L wcurrennsup2ampampK(3K(AKgtsup2i=sup2$3Csup2Xsup2Vsup2Xnsectynsup2 ntsup2 _sup2[sup2cwcurrenmacrw sup200Ksup2Dsup2 Gsup2Znsectpwnsectsup2 ntsup2 Xn wsup244EK(FKGAK (3K (5sup2=sup2 $3Esup2 csup2csup2Gsup2ntsup2 dsup2Zsup2dnsup2GAK $GK Alt4sup2 sup2Vsectshysup2$3sup2
0sup2_sup2ksup2ecurrensup2[Gsup2Jsectumlnshysup2Lsup2_sup21)58KE$4DE3K132(ampD84K -sup2$sup2 $0sup2
3sup2[sup2csup2gsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup22KGAK1(sup203$sup2$$sup2
6sup2ccsup2Gsup2nusup2dsup2Zsup2dn`regiexclsup2FK(DDK+3=0sup2$Fsup2Osup2Lsup2Hsect qsup2_sup2[sup2cwcurrenmacrw sup2csup2_sup2Vw sup2(3KGAK(DDK 13$=sup2$sup2
sup2Jsup2Rsectwbrvbarsup2Xsup2dcurrennsectpsup2Lsup2sup2Jwwpsup2_sup2[sup2cwcurrenmacrw sup2 sup2wnncurrensup2
=sup2Xsup2dsup2Pwordfwsup2ntsup2Qsup2Lsup2Hsect qsup2 sup2wnncurrensup2sup2Isup2jsup2dnsup2(FK8KGAK1lt8sup2$3sup2
U^sup2Ssup2Lsup2Zw sup2Lsup2Zwcurrensup2ksup2cnsup2(3KGAK(DDKltsup20sup2$sup2
$$sup2Isup2Zsup2gsup2Xsup2Qsect curreno sup2Gsup2Jww sup2DK83H3E4K gtsup2 $sup2$sup2
$sup2Jsup2sup2Jw currenw13sup2 gsup2[laquosup2 csup2cnsup2Qsup2Nsup2gcentsup2K(3K8ampK$$$GK$6AKKsup2$0sup2$sup2
$sup2_sup2[sup2cwcurrenmacrw sup2 csup2_sup2Vw sup2Vsup2Vcurrenwsup2K(3KGAK66sup2$sup2
$0sup2Qsup2Gsup2Xwwshyknnqwsup2ntsup2Isup2Jsup2Vnsup2(3KGBK(DDK sup2$6sup2
$3sup2Jsup2Ssectwcurrensup2ksup2dsup2dcurrensectordfwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Vcurrenwsup2(3KGAK7-sup2$3sup2$sup2
$6sup2Jsup2[ntwsup2 ntsup2 [sup2ksup2kt sup2(3K GAK(DDK+sup2$2sup2$0sup2
$sup2Lsup2_sup2Twsup2Isup2jsup2dnsup2Gsup2Hwnsup200Ksup26$$sup2$6sup2
$=sup2gsup2N currenwsup2 ntsup2 Qsup2R~nsup2K GAK (3K(E(K82(Klt4sup2 6sup2 $$sup2
$sup2Vsup2Nsup2Twntsup2 ntsup2_sup2Gsup2Ssup2kshyncurrencurrensup2FKGAKK (3K $$sup2$6sup2
sup2Jsup2ordfsup2twsup2 Ztwsup2ntsup2 Xsup2Nsup2ctw sup2KKE$4DE3K 132(ampD87KbMAsup2 6sup2$sup2
$sup2Qsup2Lsup2Hsect qsup2[sup2Zsup2Gwpsectshysup2Gsup2Znnsup2_sup2[sup2cwyenmacrw `rsup2$D2Kamp$Kamp0KK7A=sup2$sup2
sup2gsup2Zsup2xmacrwsup2_sup2[sup2cwcurrenmacrw sup2Vsup2Zwsup2amp0(4amp(K13 =sup2$sup2
sup2Wsup2 dsup2Zwsup2Vsup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_ cwcurrenmacrw sup2K(DDK$6sup2$0sup2
13
13
4]D+ZdzXU4zX7zgD4zRXdgzdg]JNJUCz1J=4]t4U4dz4gp44Uz+z]+JUz+U1zgX1+rdzXRwZkg4]dz Jdz gD4z +RXjUgz X7zpJ]IUCz Uz +z1ICJg+PzXRZjg4]zgD4z]+gJXzX7zXUU4gIXUdzgXz XRZXU4Tgdz Idz +Xjgz z pD4]4+dz A]zgD4zR+RR+PI+UzX]g4qz JgzPI4dz4gp44Uz zz+U1z z
PgDXjCDz gDIdz gz ZXHTgdz gXz +z P4+]zdgajgj]+Pz 0Igt4]4U4z 4gp44Uz gD4z gpXzgFIdz1IdgJUgIXUzJdzUXgzBU1+R4Ug+PzgXzgD4zU+gj]4zX7zgD4zJUA]R+gJXUzZ]X4ddIUCzgD+gz4+Dz+XRZPKdG4dzR4]4PrzgXzgD4zZ+agJjuP+]dzX7zGXpz4+Dz1X4dz Jgz UzDXRdOrdzg4]Rdz zgDIdz1J4]4U4zgt4gdzgD4X]J4dzX7zZ4bX]R+U4zjgzUXgzgD4X]J4dzX7zXRuZ4g4U4z4+ld4zgD4zU+gj]4zX7z+zXRZjug+gJXUzgD+gzJdz+^]J42zXjgzrz+zR+DIU4zX]z+z U4]nXjdz drdg4Rz 04Z4U1dz XUPrz XUz +zZ]XP4Rz gXz 4z dXPn40z UXgz XUz gD4z +n+HP13$3sup2 ]Ig]HLcsup2 $6sup2
13
+P4zD+]1p+]6z4n4]gD4P4ddzXU4z+Uz4qZ4gz +z U4]nXjdz drdg4Rz +U0z +z 1JCJg+PzXRZjg4]z gXz jd4z 1J=4]4Ugz grZ4dz X7z +PuCX]JgDRz4n4UzpD4UzZ4]9X]RIUCzgD4zd+R4zjU14]PrHUCz XRZjg+gJXUz PCX]JgDRdzpJgDz+zZ+]+PP4Pzdgcgj]4z]4jJ]HUCzS+TrzdJRjPg+U4Xjdz PX+Pz XZ4]+gJXUdz XUz P+]C4z1+g+z +^]+rdz +]4z 4qZ4UdIn4z A]z gX1+rdzXRZjg4]dz jgz Z]X+Prz p4PP13djJg40z gXzgD4zDJCDPrzIUg4]+gIn4zX]C+UJs+gHXUzX7zU4]unXjdzdrdg4Rfz
D4z P+ddz X7z Z+]+PP4Pz +PCX]IgDRdz IUuPj14dz +Uz JUg4]4dgIUCz +U1z UXgz Z]4Jd4Prz14U+P5z djP+ddz pDJDz p4z R+rz +PPzXXZ4]+gJn4z +PCX]JgDRdz z (jDz +PuCYIgDRdz XZ4]+g4z XUzR+Urz JUZjgz 4P4uR4Ugdz+U0z]4+Dz+zCPX+PzX]C+UJs+gJXUzrzp+rzX7zPX+PzJUg4]+gIn4zXUdg]+IUgdzD4zg4]RzXXZ4]+gJn4z _4lt]dzgXzgD4zp+rzHUz
0sup2Xsup2Vsup2Xnsectnsup2_sup2Zsup2Jsectcurrencurrensup2gsup2Zsup2wcurrenmacrwsup2Vsup2dsup2Zwsup2_sup2[sup2cwcurrenmacrw sup2amp0(6amp(K ampsup2$$sup2$3sup2
3sup2[sup2Qsup2cqwshysup2[sup2[sup2kt sup2csup2Vsup2Itwsup2ksup2ksup2_n sup28ampK $D2Kamp$K amp0K Klt(sup23$sup2$3sup2
6sup2ksup2ksup2_n sup2csup2Xsup2Inshycurrensup2csup2Vsup2Itwsup208amp03K 08GAKampD$Kltsup263sup2$3sup2
sup2Xsup2Vsup2Ynsectynsup2Xsup2[sup2_wcurrencurrenshysup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup208amp(3K 08ltGAK (AK 83I3E4Kltsup2=sup2$6sup2
)=sup2_sup2Zsup2Jsectcurrencurrensup2Xsup2Vsup2Xnsectynsup2 Hsup2Inqwsup2_[sup2cwcurrenmacrwcentsup2K(DDK7sup23sup2$3sup2
sup2gsup2Zsup2wcurrenmacrwsup2_sup2Zsup2Jsectcurrencurrensup2_sup2[sup2cwcurrenmacrw sup2sup2wnncurrensup2
sup2[sup2dwpwcurrensup2ntsup2csup2csup2Gsup208GAKK6sup2$0sup2
$sup2ksup2lsectsup2_sup2_sup2Rsup2csup2csup2Gsup2[sup2dwpwcurrensup208amp03K 08=GAKampD$K -gtltsup2$3sup2$3sup2
sup2dsup2Zsup2dnsup2jsup2Ssup2Xnsup2Gsup2Vsup2Insup213K(DDK3=sup2$3sup2
sup2dwwsup2sup2wnotnwsup2Vsup2Hsup2H sup28D8gtGA0ampAK8K83$D0ampK 82(ampE2(AK kwshyTcurrenwpoundqwqwsup2wlaquosup2lsup2$sup2sup2363sup2
0sup2Jsup2Scopywcurrensup2asup2[sup2cwcurrenmacrw sup2Qsup2gsup208Jamp03K08gtGAKampD$K1136sup2$6sup2
3sup2Gsup2Vsup2Insup2fsup2Zsup2dnsup2jsup2Ssup2Xnsup2 csup2ksup2csup2Isup2Sshywsup2 08GAK K $B6sup2
6sup2Xsup2Vsup2Xnsectznsup2_sup2[sup2cwcurrenmacrw sup2ksup2dcurrenwqwdegsect sup2Gsup2Zwlaquo sup208amp(3K 09GAK (AK 83I3E4Ksup2$$sup2$6sup2
sup2Jsup2Rsectwcurrensup2_sup2[sup2cwyenmacrw sup2Ksup2Xwsup28D8Iamp(3K8D8082Ksup2w sup2
=sup2Qsup2kntsup2amp0(4amp(K sup2$6=sup2Bsup2kwsup2 currennsup2 J sup2_sup2Zsup2Jsectcurrencurrensup2Jsup2Rsectwbrvbarsup2ntsup2
Xsup2Xnsectynsup2 ntsup2 Xsup2esup2Qwordfwsup2Qsup2] sup2ntsup2csup2Tcurrensup2ysup2 currenwsup2nshysup2wsup2qcurrenpsectcurren sup2ntsup2v qsect sup2
pDJDzPX+PzXZ4]+gJXUdz+[Z4+]zgXzXXZ4]u+g4z JTz A]RIUCz CPX+Pz X]04]z HUz +z p4PPu]4CjP+g41zR+UU4]zXXZ4]+gJn4zZD4UXRu4U+z+]4zp5PPzNUXpUzJUzZDrdIdzzz+U0zJgzD+dz44UzZ]XZXd41zgD+gzgD4rzR+rzZP+rz+Uz JR[X]g+Ugz ]XP4z JUz JXPXCJ+Qz drdg4Rdz+dzp4PPzampT4zX7zgD4z4+]PJ4dgzdjCC4dugJXUdz+PXUCz gD5d4z PJT4dzp+dzR+14zrz$juP4dsz pDXz R+HUg+HUdz gD+gz dg4]4XxdXZJz 8mdIXUz Jdz +z XXZ4]+gJn4z Z]X4ddzHdzRX14PzpDJDzXUdJdgdzX7z+Tz+]]+rzX7z0JZXP4zR+CU4gdzpJgDzdZ]HUCdzXjZPJUCzgD4zgJZdzX7z+0M+4Ugz1JZXP4dz]4Z]4d4Ugdz+zdjCuC4dgHn4z R4g+ZDX]z A]z gDJdz I14+z 4dJ14dzJgdzHXPXCJ+Pz]4P4n+U4zgD4z4qg]+gJXUzX7zdg4]4XdXZJzIUA]R+gHXUzJdz+UzJRZXag+Ugz+U1zr4gzjUdXPn41zZ]XP4RzIUznJdj+PzJUA]uR+gJXTz Z]X4ddJUCz z X]z gDJdz ]4+udXU+U1z +PdXz +dz +z +d4z JUz ZXJUhp4z14d]J4z+zXXZ4]+gJn4z+PCX]LgDRzA]zgDJdzXRZjg+gJXUz
UzgDJdz+`iIP4zp4zHz+U+Prs4zgD4zXRu[jg+gJXU+Pz dgajgj]4z X7z gD4z dg4]4X1Hdu[+]Igrz Z]XP4Rz dg+gJWCz gD4z CX+Pz X7z gD4zXRZjg+gIXTz+U0zD+]+g4]JsHUCzgD4z+ddXyI+g53z PX+Pz XUdg]-Ugdz IIz 14d]J4z +zXXZ4]+gJn4z +PCX]JgDRz gD+gz JRZP4R4UgdzgDKdzXRZjg+gJXUz+U1zJJIz4qDJJgzIgdzZ4]uA]R+U4z XUz ]+U1XR131Xgz dg4]4XC]+RdzPgDXjCDzgD4zZ]XP4Rz+11]4dd41zD4]4zJdzUXgz 1J]5gPrz ]4P+g41z gXz gD4z j4dgJXUz X7z
Jsup2[nsup2 sup2nyensup2 yenwsup2 Gbrvbarsnsup2Tyenwwqwsup2Znpnyendegshysup2 [n nqsect wcurrenyencentsup2 Tpoundcurrenyensectcurrenwsup2 sup2 gwqshysup2Indegptwsup2$sup2gsup2_sup2 sup2nyensup2currenwsup2[nnot_nssup2T yencurrensectcurrensup2|sup2 H qwsup2 Xshypwwyen sup2 0sup2 gsectpwsup2 sup2fwplusmnn currenn wsup2=sup2 Qwnshysup2
=sup2
Cooperative Computation of Stereo Disparity
D Marr T Poggio
Science New Series Vol 194 No 4262 (Oct 15 1976) pp 283-287
Stable URLhttplinksjstororgsicisici=0036-807528197610152933A1943A42623C2833ACCOSD3E20CO3B2-1
Science is currently published by American Association for the Advancement of Science
Your use of the JSTOR archive indicates your acceptance of JSTORs Terms and Conditions of Use available athttpwwwjstororgabouttermshtml JSTORs Terms and Conditions of Use provides in part that unless you have obtainedprior permission you may not download an entire issue of a journal or multiple copies of articles and you may use content inthe JSTOR archive only for your personal non-commercial use
Please contact the publisher regarding any further use of this work Publisher contact information may be obtained athttpwwwjstororgjournalsaaashtml
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission
JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals Formore information regarding JSTOR please contact supportjstororg
httpwwwjstororgMon Jan 22 124953 2007
Vision A Computational Investigation into the Human Representation and Processing of Visual Information
Foreword by Afterword by Tomaso Poggio
David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists
In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level
Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui
Visionwhatiswhere
A complex system must be understood at several different levels
Werner Reichardtrsquos scientific legacy Integrative Neuroscience
bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels
mdash computation mdash algorithms mdash biophysics and circuits
bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip
bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip
bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik
MIT (1981-)
43rd Stated Meeting of the NRP Associates March 14-17 1982
Learning theory + algorithms
Computational Neuroscience
models+experiments
ENGINEERING APPLICATIONS
bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor
How visual cortex works ndash and how it may suggest better computer vision
systems
2
1
1min ( ( ))i i Kf H i
V y f x fmicroisin
=
⎡ ⎤+⎢ ⎥
⎣ ⎦sum
Predictive regularization algorithms
Theorems on foundations of learning
MIT (1981-)
BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001
ON THE MATHEMATICAL FOUNDATIONS OF LEARNING
FELIPE CUCKER AND STEVE SMALE
The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial
T Poggio and CR Shelton
Introduction
(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear
We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of
languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-
ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])
(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice
Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)
Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In
Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University
grant No 8780043
c2001 American Mathematical Society
1
General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2
1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA
Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering
One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label
In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data
In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses
What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-
ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the
algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate
Box 1Formal definitions in supervised learning
Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example
n1lim jXn 2Xj 0 in probability) if and only if for every e 0
n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z
S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as
I$f
zVf zdmz
which is also the expected error of a new sample z drawn from thedistribution In the case of square loss
I$f
XYfx2 y2dmxy
We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S
IS$f 1
n
X
n
i1
Vf zi
Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m
n1lim jI$fS2 IS$fSj 0 in probability
An algorithm is (universally) consistent if uniformly for any distributionm and any e 0
n1lim P I$fSf2Hinf I$famp 1
0
letters to nature
NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group
Why do hierarchical architectures work
bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern
Sung amp Poggio 1995
~15 year old CBCL computer vision research face detection
since 2006 on the market (digital cameras)
Third Annual NSF Site Visit June 8 ndash 9 2016
Moore-like law for ML (1995-2018)
bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses
Visionwhatiswhere
bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)
ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex
Van Essen amp Anderson 1990
The ventral stream hierarchy V1 V2 V4 IT
A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to
position and scale changes
Kobatake amp Tanaka 1994
Visionventralstream
74
Cognition in people
Shape representation in the inferior temporalcortex of monkeys
Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during
view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view
Current Biology 1995 5552-563
Background
Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object
Most theories which postulate that transformations of animage representation precede matching assume either a
complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set
Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer
Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu
Current Biology 1995 Vol 5 No 5552
9520 spring 2003
Modelrsquos early predictions neurons become view-tuned during recognition
Poggio Edelman Riesenhuber (1990 2000)
Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995
A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)
Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007
Database collected by Oliva amp Torralba
Psychophysics of rapid categorization
Rapid categorization task (with mask to test feedforward model)
Animal present or not
30 ms ISI
20 ms
Image
Interval Image-Mask
Mask 1f noise
80 ms
Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005
Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)
Hierarchical feedforward models of the ventral stream
Decoding the neural code Matrix-like read-out from the brain
Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale
Hung Kreiman Poggio DiCarlo 2005
Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005
helliphelliphellip in 2013helliphellip
Vision A Computational Investigation into the Human Representation and Processing of Visual Information
Foreword by Afterword by Tomaso Poggio
David Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists inspiring many to enter the field In Vision Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood Researchers from a range of brain and cognitive sciences have long valued Marrs creativity intellectual power and ability to integrate insights and data from neuroscience psychology and computation This MIT Press edition makes Marrs influential work available to a new generation of students and scientists
In Marrs framework the process of vision constructs a set of representations starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment A central theme and one that has had far-reaching influence in both neuroscience and cognitive science is the notion of different levels of analysismdashin Marrs framework the computational level the algorithmic level and the hardware implementation level
Now thirty years later the main problems that occupied Marr remain fundamental open problems in the study of perception Vision provides inspiration for the continui
Visionwhatiswhere
A complex system must be understood at several different levels
Werner Reichardtrsquos scientific legacy Integrative Neuroscience
bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels
mdash computation mdash algorithms mdash biophysics and circuits
bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip
bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip
bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik
MIT (1981-)
43rd Stated Meeting of the NRP Associates March 14-17 1982
Learning theory + algorithms
Computational Neuroscience
models+experiments
ENGINEERING APPLICATIONS
bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor
How visual cortex works ndash and how it may suggest better computer vision
systems
2
1
1min ( ( ))i i Kf H i
V y f x fmicroisin
=
⎡ ⎤+⎢ ⎥
⎣ ⎦sum
Predictive regularization algorithms
Theorems on foundations of learning
MIT (1981-)
BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001
ON THE MATHEMATICAL FOUNDATIONS OF LEARNING
FELIPE CUCKER AND STEVE SMALE
The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial
T Poggio and CR Shelton
Introduction
(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear
We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of
languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-
ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])
(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice
Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)
Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In
Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University
grant No 8780043
c2001 American Mathematical Society
1
General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2
1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA
Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering
One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label
In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data
In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses
What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-
ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the
algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate
Box 1Formal definitions in supervised learning
Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example
n1lim jXn 2Xj 0 in probability) if and only if for every e 0
n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z
S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as
I$f
zVf zdmz
which is also the expected error of a new sample z drawn from thedistribution In the case of square loss
I$f
XYfx2 y2dmxy
We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S
IS$f 1
n
X
n
i1
Vf zi
Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m
n1lim jI$fS2 IS$fSj 0 in probability
An algorithm is (universally) consistent if uniformly for any distributionm and any e 0
n1lim P I$fSf2Hinf I$famp 1
0
letters to nature
NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group
Why do hierarchical architectures work
bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern
Sung amp Poggio 1995
~15 year old CBCL computer vision research face detection
since 2006 on the market (digital cameras)
Third Annual NSF Site Visit June 8 ndash 9 2016
Moore-like law for ML (1995-2018)
bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses
Visionwhatiswhere
bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)
ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex
Van Essen amp Anderson 1990
The ventral stream hierarchy V1 V2 V4 IT
A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to
position and scale changes
Kobatake amp Tanaka 1994
Visionventralstream
74
Cognition in people
Shape representation in the inferior temporalcortex of monkeys
Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during
view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view
Current Biology 1995 5552-563
Background
Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object
Most theories which postulate that transformations of animage representation precede matching assume either a
complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set
Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer
Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu
Current Biology 1995 Vol 5 No 5552
9520 spring 2003
Modelrsquos early predictions neurons become view-tuned during recognition
Poggio Edelman Riesenhuber (1990 2000)
Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995
A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)
Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007
Database collected by Oliva amp Torralba
Psychophysics of rapid categorization
Rapid categorization task (with mask to test feedforward model)
Animal present or not
30 ms ISI
20 ms
Image
Interval Image-Mask
Mask 1f noise
80 ms
Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005
Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)
Hierarchical feedforward models of the ventral stream
Decoding the neural code Matrix-like read-out from the brain
Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale
Hung Kreiman Poggio DiCarlo 2005
Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005
helliphelliphellip in 2013helliphellip
A complex system must be understood at several different levels
Werner Reichardtrsquos scientific legacy Integrative Neuroscience
bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels
mdash computation mdash algorithms mdash biophysics and circuits
bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip
bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip
bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik
MIT (1981-)
43rd Stated Meeting of the NRP Associates March 14-17 1982
Learning theory + algorithms
Computational Neuroscience
models+experiments
ENGINEERING APPLICATIONS
bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor
How visual cortex works ndash and how it may suggest better computer vision
systems
2
1
1min ( ( ))i i Kf H i
V y f x fmicroisin
=
⎡ ⎤+⎢ ⎥
⎣ ⎦sum
Predictive regularization algorithms
Theorems on foundations of learning
MIT (1981-)
BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001
ON THE MATHEMATICAL FOUNDATIONS OF LEARNING
FELIPE CUCKER AND STEVE SMALE
The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial
T Poggio and CR Shelton
Introduction
(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear
We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of
languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-
ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])
(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice
Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)
Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In
Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University
grant No 8780043
c2001 American Mathematical Society
1
General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2
1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA
Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering
One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label
In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data
In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses
What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-
ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the
algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate
Box 1Formal definitions in supervised learning
Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example
n1lim jXn 2Xj 0 in probability) if and only if for every e 0
n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z
S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as
I$f
zVf zdmz
which is also the expected error of a new sample z drawn from thedistribution In the case of square loss
I$f
XYfx2 y2dmxy
We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S
IS$f 1
n
X
n
i1
Vf zi
Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m
n1lim jI$fS2 IS$fSj 0 in probability
An algorithm is (universally) consistent if uniformly for any distributionm and any e 0
n1lim P I$fSf2Hinf I$famp 1
0
letters to nature
NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group
Why do hierarchical architectures work
bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern
Sung amp Poggio 1995
~15 year old CBCL computer vision research face detection
since 2006 on the market (digital cameras)
Third Annual NSF Site Visit June 8 ndash 9 2016
Moore-like law for ML (1995-2018)
bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses
Visionwhatiswhere
bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)
ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex
Van Essen amp Anderson 1990
The ventral stream hierarchy V1 V2 V4 IT
A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to
position and scale changes
Kobatake amp Tanaka 1994
Visionventralstream
74
Cognition in people
Shape representation in the inferior temporalcortex of monkeys
Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during
view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view
Current Biology 1995 5552-563
Background
Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object
Most theories which postulate that transformations of animage representation precede matching assume either a
complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set
Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer
Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu
Current Biology 1995 Vol 5 No 5552
9520 spring 2003
Modelrsquos early predictions neurons become view-tuned during recognition
Poggio Edelman Riesenhuber (1990 2000)
Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995
A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)
Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007
Database collected by Oliva amp Torralba
Psychophysics of rapid categorization
Rapid categorization task (with mask to test feedforward model)
Animal present or not
30 ms ISI
20 ms
Image
Interval Image-Mask
Mask 1f noise
80 ms
Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005
Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)
Hierarchical feedforward models of the ventral stream
Decoding the neural code Matrix-like read-out from the brain
Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale
Hung Kreiman Poggio DiCarlo 2005
Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005
helliphelliphellip in 2013helliphellip
Werner Reichardtrsquos scientific legacy Integrative Neuroscience
bull Marrrsquos book Vision (Marr 1982) had a great impact on computational neuroscience a system as complex as the brain must be understood at several different levels
mdash computation mdash algorithms mdash biophysics and circuits
bull The argument came from ldquoFrom Understanding Computation to Understanding Neural Circuitsrdquo Marr and Poggio 1977hellip
bull hellippart of which comes from Reichardt and Poggio 1976 (Q Rev Biophysics Part I)hellip
bull hellipwhich is a follow-up of Wernerrsquos argument for starting the Max-Planck-Institute fuer Biologische Kybernetik
MIT (1981-)
43rd Stated Meeting of the NRP Associates March 14-17 1982
Learning theory + algorithms
Computational Neuroscience
models+experiments
ENGINEERING APPLICATIONS
bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor
How visual cortex works ndash and how it may suggest better computer vision
systems
2
1
1min ( ( ))i i Kf H i
V y f x fmicroisin
=
⎡ ⎤+⎢ ⎥
⎣ ⎦sum
Predictive regularization algorithms
Theorems on foundations of learning
MIT (1981-)
BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001
ON THE MATHEMATICAL FOUNDATIONS OF LEARNING
FELIPE CUCKER AND STEVE SMALE
The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial
T Poggio and CR Shelton
Introduction
(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear
We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of
languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-
ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])
(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice
Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)
Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In
Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University
grant No 8780043
c2001 American Mathematical Society
1
General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2
1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA
Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering
One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label
In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data
In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses
What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-
ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the
algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate
Box 1Formal definitions in supervised learning
Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example
n1lim jXn 2Xj 0 in probability) if and only if for every e 0
n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z
S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as
I$f
zVf zdmz
which is also the expected error of a new sample z drawn from thedistribution In the case of square loss
I$f
XYfx2 y2dmxy
We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S
IS$f 1
n
X
n
i1
Vf zi
Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m
n1lim jI$fS2 IS$fSj 0 in probability
An algorithm is (universally) consistent if uniformly for any distributionm and any e 0
n1lim P I$fSf2Hinf I$famp 1
0
letters to nature
NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group
Why do hierarchical architectures work
bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern
Sung amp Poggio 1995
~15 year old CBCL computer vision research face detection
since 2006 on the market (digital cameras)
Third Annual NSF Site Visit June 8 ndash 9 2016
Moore-like law for ML (1995-2018)
bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses
Visionwhatiswhere
bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)
ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex
Van Essen amp Anderson 1990
The ventral stream hierarchy V1 V2 V4 IT
A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to
position and scale changes
Kobatake amp Tanaka 1994
Visionventralstream
74
Cognition in people
Shape representation in the inferior temporalcortex of monkeys
Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during
view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view
Current Biology 1995 5552-563
Background
Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object
Most theories which postulate that transformations of animage representation precede matching assume either a
complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set
Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer
Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu
Current Biology 1995 Vol 5 No 5552
9520 spring 2003
Modelrsquos early predictions neurons become view-tuned during recognition
Poggio Edelman Riesenhuber (1990 2000)
Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995
A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)
Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007
Database collected by Oliva amp Torralba
Psychophysics of rapid categorization
Rapid categorization task (with mask to test feedforward model)
Animal present or not
30 ms ISI
20 ms
Image
Interval Image-Mask
Mask 1f noise
80 ms
Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005
Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)
Hierarchical feedforward models of the ventral stream
Decoding the neural code Matrix-like read-out from the brain
Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale
Hung Kreiman Poggio DiCarlo 2005
Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005
helliphelliphellip in 2013helliphellip
MIT (1981-)
43rd Stated Meeting of the NRP Associates March 14-17 1982
Learning theory + algorithms
Computational Neuroscience
models+experiments
ENGINEERING APPLICATIONS
bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor
How visual cortex works ndash and how it may suggest better computer vision
systems
2
1
1min ( ( ))i i Kf H i
V y f x fmicroisin
=
⎡ ⎤+⎢ ⎥
⎣ ⎦sum
Predictive regularization algorithms
Theorems on foundations of learning
MIT (1981-)
BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001
ON THE MATHEMATICAL FOUNDATIONS OF LEARNING
FELIPE CUCKER AND STEVE SMALE
The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial
T Poggio and CR Shelton
Introduction
(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear
We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of
languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-
ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])
(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice
Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)
Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In
Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University
grant No 8780043
c2001 American Mathematical Society
1
General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2
1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA
Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering
One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label
In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data
In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses
What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-
ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the
algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate
Box 1Formal definitions in supervised learning
Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example
n1lim jXn 2Xj 0 in probability) if and only if for every e 0
n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z
S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as
I$f
zVf zdmz
which is also the expected error of a new sample z drawn from thedistribution In the case of square loss
I$f
XYfx2 y2dmxy
We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S
IS$f 1
n
X
n
i1
Vf zi
Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m
n1lim jI$fS2 IS$fSj 0 in probability
An algorithm is (universally) consistent if uniformly for any distributionm and any e 0
n1lim P I$fSf2Hinf I$famp 1
0
letters to nature
NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group
Why do hierarchical architectures work
bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern
Sung amp Poggio 1995
~15 year old CBCL computer vision research face detection
since 2006 on the market (digital cameras)
Third Annual NSF Site Visit June 8 ndash 9 2016
Moore-like law for ML (1995-2018)
bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses
Visionwhatiswhere
bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)
ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex
Van Essen amp Anderson 1990
The ventral stream hierarchy V1 V2 V4 IT
A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to
position and scale changes
Kobatake amp Tanaka 1994
Visionventralstream
74
Cognition in people
Shape representation in the inferior temporalcortex of monkeys
Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during
view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view
Current Biology 1995 5552-563
Background
Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object
Most theories which postulate that transformations of animage representation precede matching assume either a
complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set
Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer
Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu
Current Biology 1995 Vol 5 No 5552
9520 spring 2003
Modelrsquos early predictions neurons become view-tuned during recognition
Poggio Edelman Riesenhuber (1990 2000)
Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995
A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)
Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007
Database collected by Oliva amp Torralba
Psychophysics of rapid categorization
Rapid categorization task (with mask to test feedforward model)
Animal present or not
30 ms ISI
20 ms
Image
Interval Image-Mask
Mask 1f noise
80 ms
Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005
Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)
Hierarchical feedforward models of the ventral stream
Decoding the neural code Matrix-like read-out from the brain
Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale
Hung Kreiman Poggio DiCarlo 2005
Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005
helliphelliphellip in 2013helliphellip
43rd Stated Meeting of the NRP Associates March 14-17 1982
Learning theory + algorithms
Computational Neuroscience
models+experiments
ENGINEERING APPLICATIONS
bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor
How visual cortex works ndash and how it may suggest better computer vision
systems
2
1
1min ( ( ))i i Kf H i
V y f x fmicroisin
=
⎡ ⎤+⎢ ⎥
⎣ ⎦sum
Predictive regularization algorithms
Theorems on foundations of learning
MIT (1981-)
BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001
ON THE MATHEMATICAL FOUNDATIONS OF LEARNING
FELIPE CUCKER AND STEVE SMALE
The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial
T Poggio and CR Shelton
Introduction
(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear
We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of
languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-
ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])
(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice
Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)
Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In
Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University
grant No 8780043
c2001 American Mathematical Society
1
General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2
1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA
Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering
One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label
In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data
In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses
What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-
ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the
algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate
Box 1Formal definitions in supervised learning
Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example
n1lim jXn 2Xj 0 in probability) if and only if for every e 0
n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z
S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as
I$f
zVf zdmz
which is also the expected error of a new sample z drawn from thedistribution In the case of square loss
I$f
XYfx2 y2dmxy
We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S
IS$f 1
n
X
n
i1
Vf zi
Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m
n1lim jI$fS2 IS$fSj 0 in probability
An algorithm is (universally) consistent if uniformly for any distributionm and any e 0
n1lim P I$fSf2Hinf I$famp 1
0
letters to nature
NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group
Why do hierarchical architectures work
bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern
Sung amp Poggio 1995
~15 year old CBCL computer vision research face detection
since 2006 on the market (digital cameras)
Third Annual NSF Site Visit June 8 ndash 9 2016
Moore-like law for ML (1995-2018)
bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses
Visionwhatiswhere
bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)
ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex
Van Essen amp Anderson 1990
The ventral stream hierarchy V1 V2 V4 IT
A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to
position and scale changes
Kobatake amp Tanaka 1994
Visionventralstream
74
Cognition in people
Shape representation in the inferior temporalcortex of monkeys
Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during
view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view
Current Biology 1995 5552-563
Background
Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object
Most theories which postulate that transformations of animage representation precede matching assume either a
complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set
Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer
Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu
Current Biology 1995 Vol 5 No 5552
9520 spring 2003
Modelrsquos early predictions neurons become view-tuned during recognition
Poggio Edelman Riesenhuber (1990 2000)
Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995
A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)
Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007
Database collected by Oliva amp Torralba
Psychophysics of rapid categorization
Rapid categorization task (with mask to test feedforward model)
Animal present or not
30 ms ISI
20 ms
Image
Interval Image-Mask
Mask 1f noise
80 ms
Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005
Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)
Hierarchical feedforward models of the ventral stream
Decoding the neural code Matrix-like read-out from the brain
Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale
Hung Kreiman Poggio DiCarlo 2005
Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005
helliphelliphellip in 2013helliphellip
Learning theory + algorithms
Computational Neuroscience
models+experiments
ENGINEERING APPLICATIONS
bull Bioinformatics bull Computer vision bull Computer graphics speech synthesis creating a virtual actor
How visual cortex works ndash and how it may suggest better computer vision
systems
2
1
1min ( ( ))i i Kf H i
V y f x fmicroisin
=
⎡ ⎤+⎢ ⎥
⎣ ⎦sum
Predictive regularization algorithms
Theorems on foundations of learning
MIT (1981-)
BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001
ON THE MATHEMATICAL FOUNDATIONS OF LEARNING
FELIPE CUCKER AND STEVE SMALE
The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial
T Poggio and CR Shelton
Introduction
(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear
We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of
languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-
ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])
(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice
Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)
Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In
Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University
grant No 8780043
c2001 American Mathematical Society
1
General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2
1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA
Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering
One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label
In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data
In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses
What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-
ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the
algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate
Box 1Formal definitions in supervised learning
Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example
n1lim jXn 2Xj 0 in probability) if and only if for every e 0
n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z
S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as
I$f
zVf zdmz
which is also the expected error of a new sample z drawn from thedistribution In the case of square loss
I$f
XYfx2 y2dmxy
We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S
IS$f 1
n
X
n
i1
Vf zi
Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m
n1lim jI$fS2 IS$fSj 0 in probability
An algorithm is (universally) consistent if uniformly for any distributionm and any e 0
n1lim P I$fSf2Hinf I$famp 1
0
letters to nature
NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group
Why do hierarchical architectures work
bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern
Sung amp Poggio 1995
~15 year old CBCL computer vision research face detection
since 2006 on the market (digital cameras)
Third Annual NSF Site Visit June 8 ndash 9 2016
Moore-like law for ML (1995-2018)
bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses
Visionwhatiswhere
bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)
ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex
Van Essen amp Anderson 1990
The ventral stream hierarchy V1 V2 V4 IT
A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to
position and scale changes
Kobatake amp Tanaka 1994
Visionventralstream
74
Cognition in people
Shape representation in the inferior temporalcortex of monkeys
Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during
view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view
Current Biology 1995 5552-563
Background
Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object
Most theories which postulate that transformations of animage representation precede matching assume either a
complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set
Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer
Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu
Current Biology 1995 Vol 5 No 5552
9520 spring 2003
Modelrsquos early predictions neurons become view-tuned during recognition
Poggio Edelman Riesenhuber (1990 2000)
Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995
A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)
Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007
Database collected by Oliva amp Torralba
Psychophysics of rapid categorization
Rapid categorization task (with mask to test feedforward model)
Animal present or not
30 ms ISI
20 ms
Image
Interval Image-Mask
Mask 1f noise
80 ms
Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005
Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)
Hierarchical feedforward models of the ventral stream
Decoding the neural code Matrix-like read-out from the brain
Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale
Hung Kreiman Poggio DiCarlo 2005
Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005
helliphelliphellip in 2013helliphellip
BULLETIN (New Series) OF THEAMERICAN MATHEMATICAL SOCIETYVolume 39 Number 1 Pages 1ndash49S 0273-0979(01)00923-5Article electronically published on October 5 2001
ON THE MATHEMATICAL FOUNDATIONS OF LEARNING
FELIPE CUCKER AND STEVE SMALE
The problem of learning is arguably at thevery core of the problem of intelligenceboth biological and artificial
T Poggio and CR Shelton
Introduction
(1) A main theme of this report is the relationship of approximation to learning andthe primary role of sampling (inductive inference) We try to emphasize relationsof the theory of learning to the mainstream of mathematics In particular thereare large roles for probability theory for algorithms such as least squares and fortools and ideas from linear algebra and linear analysis An advantage of doing thisis that communication is facilitated and the power of core mathematics is moreeasily brought to bear
We illustrate what we mean by learning theory by giving some instances(a) The understanding of language acquisition by children or the emergence of
languages in early human cultures(b) In Manufacturing Engineering the design of a new wave of machines is an-
ticipated which uses sensors to sample properties of objects before duringand after treatment The information gathered from these samples is to beanalyzed by the machine to decide how to better deal with new input objects(see [43])
(c) Pattern recognition of objects ranging from handwritten letters of the alpha-bet to pictures of animals to the human voice
Understanding the laws of learning plays a large role in disciplines such as (Cog-nitive) Psychology Animal Behavior Economic Decision Making all branches ofEngineering Computer Science and especially the study of human thought pro-cesses (how the brain works)
Mathematics has already played a big role towards the goal of giving a univer-sal foundation of studies in these disciplines We mention as examples the theoryof Neural Networks going back to McCulloch and Pitts [25] and Minsky and Pa-pert [27] the PAC learning of Valiant [40] Statistical Learning Theory as devel-oped by Vapnik [42] and the use of reproducing kernels as in [17] among manyother mathematical developments We are heavily indebted to these developmentsRecent discussions with a number of mathematicians have also been helpful In
Received by the editors April 2000 and in revised form June 1 20012000 Mathematics Subject Classification Primary 68T05 68P30This work has been substantially funded by CERG grant No 9040457 and City University
grant No 8780043
c2001 American Mathematical Society
1
General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2
1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA
Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering
One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label
In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data
In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses
What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-
ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the
algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate
Box 1Formal definitions in supervised learning
Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example
n1lim jXn 2Xj 0 in probability) if and only if for every e 0
n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z
S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as
I$f
zVf zdmz
which is also the expected error of a new sample z drawn from thedistribution In the case of square loss
I$f
XYfx2 y2dmxy
We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S
IS$f 1
n
X
n
i1
Vf zi
Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m
n1lim jI$fS2 IS$fSj 0 in probability
An algorithm is (universally) consistent if uniformly for any distributionm and any e 0
n1lim P I$fSf2Hinf I$famp 1
0
letters to nature
NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group
Why do hierarchical architectures work
bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern
Sung amp Poggio 1995
~15 year old CBCL computer vision research face detection
since 2006 on the market (digital cameras)
Third Annual NSF Site Visit June 8 ndash 9 2016
Moore-like law for ML (1995-2018)
bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses
Visionwhatiswhere
bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)
ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex
Van Essen amp Anderson 1990
The ventral stream hierarchy V1 V2 V4 IT
A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to
position and scale changes
Kobatake amp Tanaka 1994
Visionventralstream
74
Cognition in people
Shape representation in the inferior temporalcortex of monkeys
Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during
view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view
Current Biology 1995 5552-563
Background
Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object
Most theories which postulate that transformations of animage representation precede matching assume either a
complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set
Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer
Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu
Current Biology 1995 Vol 5 No 5552
9520 spring 2003
Modelrsquos early predictions neurons become view-tuned during recognition
Poggio Edelman Riesenhuber (1990 2000)
Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995
A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)
Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007
Database collected by Oliva amp Torralba
Psychophysics of rapid categorization
Rapid categorization task (with mask to test feedforward model)
Animal present or not
30 ms ISI
20 ms
Image
Interval Image-Mask
Mask 1f noise
80 ms
Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005
Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)
Hierarchical feedforward models of the ventral stream
Decoding the neural code Matrix-like read-out from the brain
Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale
Hung Kreiman Poggio DiCarlo 2005
Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005
helliphelliphellip in 2013helliphellip
General conditions for predictivityin learning theoryTomaso Poggio1 Ryan Rifkin14 Sayan Mukherjee13 amp Partha Niyogi2
1Center for Biological and Computational Learning McGovern InstituteComputer Science Artificial Intelligence Laboratory Brain Sciences DepartmentMIT Cambridge Massachusetts 02139 USA2Departments of Computer Science and Statistics University of Chicago ChicagoIllinois 60637 USA3Cancer Genomics Group Center for Genome ResearchWhitehead InstituteMITCambridge Massachusetts 02139 USA4Honda Research Institute USA Inc Boston Massachusetts 02111 USA
Developing theoretical foundations for learning is a key steptowards understanding intelligence lsquoLearning from examplesrsquo isa paradigm in which systems (natural or artificial) learn afunctional relationship from a training set of examples Withinthis paradigm a learning algorithm is a map from the space oftraining sets to the hypothesis space of possible functionalsolutions A central question for the theory is to determineconditions under which a learning algorithm will generalizefrom its finite training set to novel examples A milestone inlearning theory1ndash5 was a characterization of conditions on thehypothesis space that ensure generalization for the natural classof empirical risk minimization (ERM) learning algorithms thatare based on minimizing the error on the training set Here weprovide conditions for generalization in terms of a precisestability property of the learning process when the training setis perturbed by deleting one example the learned hypothesisdoes not change much This stability property stipulates con-ditions on the learning map rather than on the hypothesis spacesubsumes the classical theory for ERM algorithms and is appli-cable to more general algorithms The surprising connectionbetween stability and predictivity has implications for the foun-dations of learning theory and for the design of novel algorithmsand provides insights into problems as diverse as languagelearning and inverse problems in physics and engineering
One of the main impacts of learning theory is on engineeringSystems that learn from examples to perform a specific task havemany applications6 For instance a system may be needed torecognize whether an image contains a face or not Such a systemcould be trained with positive and negative examples images withand without faces In this case the input image is a point in amultidimensional space of variables such as pixel values its associ-ated output is a binary lsquoyesrsquo or lsquonorsquo label
In the auditory domain one may consider a variety of problemsConsider speaker authentication The input is an acoustic utteranceand the system has to determine whether it was produced by aparticular target speaker or not Training examples would thenconsist of a set of utterances each labelled according to whether ornot they were produced by the target speaker Similarly in speechrecognition one wishes to learn a function that maps acousticutterances to their underlying phonetic sequences In learning thesyntax of a language one wishes to learn a function that mapssequences of words to their grammaticality values These functionscould be acquired from training data
In another application in computational biology algorithms havebeen developed that can produce a diagnosis of the type of cancerfrom a set of measurements of the expression level of manythousands of human genes in a biopsy of the tumour measuredwith a complementary DNA microarray containing probes for anumber of genes Again the software learns the classification rulefrom a set of examples that is from examples of expression patternsin a number of patients with known diagnoses
What we assume in the above examples is a machine that istrained instead of programmed to perform a task given data of theform S xiyini1 Training means synthesizing a function thatbest represents the relation between the inputs x i and the corre-sponding outputs y iThe basic requirement for any learning algorithm is generaliz-
ation the performance on the training examples (empirical error)must be a good indicator of the performance on future examples(expected error) that is the difference between the two must belsquosmallrsquo (see Box 1 for definitions see also Fig 1)Probably the most natural learning algorithm is ERM the
algorithm lsquolooksrsquo at the training set S and selects as the estimatedfunction the one that minimizes the empirical error (training error)over the functions contained in a hypothesis space of candidate
Box 1Formal definitions in supervised learning
Convergence in probability A sequence of random variables Xnconverges in probability to a random variable X (for example
n1lim jXn 2Xj 0 in probability) if and only if for every e 0
n1limPjXn 2Xj e 0Training data The training data comprise input and output pairs Theinput data X is assumed to be a compact domain in an euclideanspace and the output data Y is assumed to be a closed subset of RkThere is an unknown probability distribution m(xy) on the productspace Z X pound Y The training set S consists of n independent andidentically drawn samples from the distribution on Z
S z1 x1y1hellipzn xnynLearning algorithms A learning algorithm takes as input a data set Sand outputs a function fS that represents the relation between theinput x and output y Formally the algorithm can be stated as a mapL ltn$1Zn HwhereH called the hypothesis space is the space offunctions the algorithm lsquosearchesrsquo to select fS We assume that thealgorithm is symmetric that is fS does not depend on the ordering ofthe samples in S Most learning algorithms are either regression orclassification algorithms depending on whether y is real-valued orbinary valuedLoss functions We denote the price we pay with V(f z) when theprediction for a given x is f(x) and the true value is y We assume thatV(f z) is always bounded A classical example of a loss function is thesquare loss Vf z fx2 y2Expected error The expected error of a function f is defined as
I$f
zVf zdmz
which is also the expected error of a new sample z drawn from thedistribution In the case of square loss
I$f
XYfx2 y2dmxy
We would like to find functions for which I[f] is small However wecannot compute I[f] because we are not given the distribution mEmpirical error The following quantity called empirical error can becomputed given the training data S
IS$f 1
n
X
n
i1
Vf zi
Generalization and consistency An algorithm generalizes if thefunction fS selected by it satisfies for all S (jSj n) and uniformly forany probability distribution m
n1lim jI$fS2 IS$fSj 0 in probability
An algorithm is (universally) consistent if uniformly for any distributionm and any e 0
n1lim P I$fSf2Hinf I$famp 1
0
letters to nature
NATURE |VOL 428 | 25 MARCH 2004 | wwwnaturecomnature 419copy 2004 Nature Publishing Group
Why do hierarchical architectures work
bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern
Sung amp Poggio 1995
~15 year old CBCL computer vision research face detection
since 2006 on the market (digital cameras)
Third Annual NSF Site Visit June 8 ndash 9 2016
Moore-like law for ML (1995-2018)
bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses
Visionwhatiswhere
bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)
ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex
Van Essen amp Anderson 1990
The ventral stream hierarchy V1 V2 V4 IT
A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to
position and scale changes
Kobatake amp Tanaka 1994
Visionventralstream
74
Cognition in people
Shape representation in the inferior temporalcortex of monkeys
Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during
view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view
Current Biology 1995 5552-563
Background
Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object
Most theories which postulate that transformations of animage representation precede matching assume either a
complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set
Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer
Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu
Current Biology 1995 Vol 5 No 5552
9520 spring 2003
Modelrsquos early predictions neurons become view-tuned during recognition
Poggio Edelman Riesenhuber (1990 2000)
Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995
A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)
Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007
Database collected by Oliva amp Torralba
Psychophysics of rapid categorization
Rapid categorization task (with mask to test feedforward model)
Animal present or not
30 ms ISI
20 ms
Image
Interval Image-Mask
Mask 1f noise
80 ms
Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005
Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)
Hierarchical feedforward models of the ventral stream
Decoding the neural code Matrix-like read-out from the brain
Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale
Hung Kreiman Poggio DiCarlo 2005
Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005
helliphelliphellip in 2013helliphellip
Why do hierarchical architectures work
bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern
Sung amp Poggio 1995
~15 year old CBCL computer vision research face detection
since 2006 on the market (digital cameras)
Third Annual NSF Site Visit June 8 ndash 9 2016
Moore-like law for ML (1995-2018)
bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses
Visionwhatiswhere
bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)
ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex
Van Essen amp Anderson 1990
The ventral stream hierarchy V1 V2 V4 IT
A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to
position and scale changes
Kobatake amp Tanaka 1994
Visionventralstream
74
Cognition in people
Shape representation in the inferior temporalcortex of monkeys
Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during
view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view
Current Biology 1995 5552-563
Background
Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object
Most theories which postulate that transformations of animage representation precede matching assume either a
complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set
Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer
Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu
Current Biology 1995 Vol 5 No 5552
9520 spring 2003
Modelrsquos early predictions neurons become view-tuned during recognition
Poggio Edelman Riesenhuber (1990 2000)
Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995
A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)
Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007
Database collected by Oliva amp Torralba
Psychophysics of rapid categorization
Rapid categorization task (with mask to test feedforward model)
Animal present or not
30 ms ISI
20 ms
Image
Interval Image-Mask
Mask 1f noise
80 ms
Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005
Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)
Hierarchical feedforward models of the ventral stream
Decoding the neural code Matrix-like read-out from the brain
Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale
Hung Kreiman Poggio DiCarlo 2005
Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005
helliphelliphellip in 2013helliphellip
bull Training Database bull 1000+ Real 3000+ VIRTUAL bull 500000+ Non-Face Pattern
Sung amp Poggio 1995
~15 year old CBCL computer vision research face detection
since 2006 on the market (digital cameras)
Third Annual NSF Site Visit June 8 ndash 9 2016
Moore-like law for ML (1995-2018)
bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses
Visionwhatiswhere
bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)
ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex
Van Essen amp Anderson 1990
The ventral stream hierarchy V1 V2 V4 IT
A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to
position and scale changes
Kobatake amp Tanaka 1994
Visionventralstream
74
Cognition in people
Shape representation in the inferior temporalcortex of monkeys
Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during
view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view
Current Biology 1995 5552-563
Background
Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object
Most theories which postulate that transformations of animage representation precede matching assume either a
complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set
Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer
Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu
Current Biology 1995 Vol 5 No 5552
9520 spring 2003
Modelrsquos early predictions neurons become view-tuned during recognition
Poggio Edelman Riesenhuber (1990 2000)
Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995
A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)
Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007
Database collected by Oliva amp Torralba
Psychophysics of rapid categorization
Rapid categorization task (with mask to test feedforward model)
Animal present or not
30 ms ISI
20 ms
Image
Interval Image-Mask
Mask 1f noise
80 ms
Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005
Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)
Hierarchical feedforward models of the ventral stream
Decoding the neural code Matrix-like read-out from the brain
Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale
Hung Kreiman Poggio DiCarlo 2005
Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005
helliphelliphellip in 2013helliphellip
Third Annual NSF Site Visit June 8 ndash 9 2016
Moore-like law for ML (1995-2018)
bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses
Visionwhatiswhere
bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)
ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex
Van Essen amp Anderson 1990
The ventral stream hierarchy V1 V2 V4 IT
A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to
position and scale changes
Kobatake amp Tanaka 1994
Visionventralstream
74
Cognition in people
Shape representation in the inferior temporalcortex of monkeys
Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during
view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view
Current Biology 1995 5552-563
Background
Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object
Most theories which postulate that transformations of animage representation precede matching assume either a
complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set
Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer
Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu
Current Biology 1995 Vol 5 No 5552
9520 spring 2003
Modelrsquos early predictions neurons become view-tuned during recognition
Poggio Edelman Riesenhuber (1990 2000)
Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995
A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)
Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007
Database collected by Oliva amp Torralba
Psychophysics of rapid categorization
Rapid categorization task (with mask to test feedforward model)
Animal present or not
30 ms ISI
20 ms
Image
Interval Image-Mask
Mask 1f noise
80 ms
Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005
Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)
Hierarchical feedforward models of the ventral stream
Decoding the neural code Matrix-like read-out from the brain
Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale
Hung Kreiman Poggio DiCarlo 2005
Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005
helliphelliphellip in 2013helliphellip
bull HumanBrainndash 1010-1011neurons(~1millionflies)ndash 1014-1015synapses
Visionwhatiswhere
bull Ventralstreaminrhesusmonkeyndash ~109neuronsintheventralstream(350106ineachemisphere)
ndash ~15106neuronsinAIT(AnteriorInferoTemporal)cortex
Van Essen amp Anderson 1990
The ventral stream hierarchy V1 V2 V4 IT
A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to
position and scale changes
Kobatake amp Tanaka 1994
Visionventralstream
74
Cognition in people
Shape representation in the inferior temporalcortex of monkeys
Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during
view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view
Current Biology 1995 5552-563
Background
Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object
Most theories which postulate that transformations of animage representation precede matching assume either a
complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set
Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer
Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu
Current Biology 1995 Vol 5 No 5552
9520 spring 2003
Modelrsquos early predictions neurons become view-tuned during recognition
Poggio Edelman Riesenhuber (1990 2000)
Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995
A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)
Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007
Database collected by Oliva amp Torralba
Psychophysics of rapid categorization
Rapid categorization task (with mask to test feedforward model)
Animal present or not
30 ms ISI
20 ms
Image
Interval Image-Mask
Mask 1f noise
80 ms
Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005
Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)
Hierarchical feedforward models of the ventral stream
Decoding the neural code Matrix-like read-out from the brain
Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale
Hung Kreiman Poggio DiCarlo 2005
Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005
helliphelliphellip in 2013helliphellip
The ventral stream hierarchy V1 V2 V4 IT
A gradual increase in the receptive field size in the ldquocomplexityrdquo of the preferred stimulus in ldquoinvariancerdquo to
position and scale changes
Kobatake amp Tanaka 1994
Visionventralstream
74
Cognition in people
Shape representation in the inferior temporalcortex of monkeys
Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during
view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view
Current Biology 1995 5552-563
Background
Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object
Most theories which postulate that transformations of animage representation precede matching assume either a
complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set
Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer
Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu
Current Biology 1995 Vol 5 No 5552
9520 spring 2003
Modelrsquos early predictions neurons become view-tuned during recognition
Poggio Edelman Riesenhuber (1990 2000)
Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995
A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)
Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007
Database collected by Oliva amp Torralba
Psychophysics of rapid categorization
Rapid categorization task (with mask to test feedforward model)
Animal present or not
30 ms ISI
20 ms
Image
Interval Image-Mask
Mask 1f noise
80 ms
Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005
Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)
Hierarchical feedforward models of the ventral stream
Decoding the neural code Matrix-like read-out from the brain
Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale
Hung Kreiman Poggio DiCarlo 2005
Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005
helliphelliphellip in 2013helliphellip
74
Cognition in people
Shape representation in the inferior temporalcortex of monkeys
Nikos K Logothetis Jon Pauls and Tomaso PoggiotDivision of Neuroscience Baylor College of Medicine One Baylor Plaza Houston Texas 77030 USA tCenter for Computational andBiological Learning and Department of Brain Sciences Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA
Background The inferior temporal cortex (IT) of themonkey has long been known to play an essential role invisual object recognition Damage to this area results insevere deficits in perceptual learning and object recog-nition without significantly affecting basic visual capaci-ties Consistent with these ablation studies is the discoveryof IT neurons that respond to complex two-dimensionalvisual patterns or objects such as faces or body partsWhat is the role of these neurons in object recognition Issuch a complex configurational selectivity specific to bio-logically meaningful objects or does it develop as a resultof extensive exposure to any objects whose identificationrelies on subtle shape differences If so would IT neuronsrespond selectively to recently learned views or features ofnovel objects The present study addresses this questionby using combined psychophysical and electrophysiologi-cal experiments in which monkeys learned to classify andrecognize computer-generated three-dimensional objectsResults A population of IT neurons was found thatresponded selectively to views of previously unfamiliarobjects The cells discharged maximally to one view ofan object and their response declined gradually as theobject was rotated away from this preferred view Noselective responses were ever encountered for views thatthe animal systematically failed to recognize Most neu-rons also exhibited orientation-dependent responses during
view-plane rotations Some neurons were found to betuned around two views of the same object and a verysmall number of cells responded in a view-invariant man-ner For the five different objects that were used exten-sively during the training of the animals and for whichbehavioral performance became view-independent mul-tiple cells were found that were tuned around differentviews of the same object A number of view-selective unitsshowed response invariance for changes in the size of theobject or the position of its image within the parafoveaConclusion Our results suggest that IT neurons candevelop a complex receptive field organization as a con-sequence of extensive training in the discrimination andrecognition of objects None of these objects had anyprior meaning for the animal nor did they resemble any-thing familiar in the monkeys environment Simplegeometric features did not appear to account for theneurons selective responses These findings support theidea that a population of neurons - each tuned to a dif-ferent object aspect and each showing a certain degreeof invariance to image transformations - may as an en-semble encode at least some types of complex three-dimensional objects In such a system several neurons maybe active for any given vantage point with a single unitacting like a blurred template for a limited neighborhoodof a single view
Current Biology 1995 5552-563
Background
Object recognition can be thought of as the process ofmatching the image of an object to its representationstored in memory Because different viewing illumina-tion and context conditions generate different retinalimages understanding the nature of the stored represen-tation and the process by which sensory input is normal-ized is one of the greatest challenges in research on visualobject recognition It is well known that familiar objectsare recognized regardless of viewing angle scale or posi-tion in the visual field How is such perceptual objectconstancy accomplished Does the brain transform thesensory or stored representation to discard the imagevariability resulting from different viewing conditionsor does generalization occur as a consequence of percep-tual learning that is of being acquainted with differentinstances of any given object
Most theories which postulate that transformations of animage representation precede matching assume either a
complete three-dimensional description of an object [1]or a structural description of the image that specifiesthe relationships among viewpoint-invariant volumetricprimitives [23] In such theories the locations are speci-fied in a coordinate system defined by the viewed objectIn contrast theories assuming perceptual learning areviewer-centered postulating that three-dimensional ob-jects are modelled as a set of familiar two-dimensionalviews or aspects and that recognition consists ofmatching image features against the views held in this set
Whereas object-centered theories correctly predict theview-independent recognition of familiar objects [3]they fail to account for performance in recognition taskswith certain types of novel objects [4-8] Viewer-cen-tered models on the other hand which can account forthe performance of human subjects in any recognitiontask are usually considered implausible because of theamount of memory a system would require to store alldiscriminable views of many objects These objectionshowever have recently been challenged by computer
Correspondence to Nikos K Logothetis E-mail address nikosbcmtmcedu
Current Biology 1995 Vol 5 No 5552
9520 spring 2003
Modelrsquos early predictions neurons become view-tuned during recognition
Poggio Edelman Riesenhuber (1990 2000)
Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995
A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)
Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007
Database collected by Oliva amp Torralba
Psychophysics of rapid categorization
Rapid categorization task (with mask to test feedforward model)
Animal present or not
30 ms ISI
20 ms
Image
Interval Image-Mask
Mask 1f noise
80 ms
Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005
Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)
Hierarchical feedforward models of the ventral stream
Decoding the neural code Matrix-like read-out from the brain
Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale
Hung Kreiman Poggio DiCarlo 2005
Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005
helliphelliphellip in 2013helliphellip
9520 spring 2003
Modelrsquos early predictions neurons become view-tuned during recognition
Poggio Edelman Riesenhuber (1990 2000)
Logothetis Pauls and Poggio 1995 Logothetis Pauls 1995
A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)
Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007
Database collected by Oliva amp Torralba
Psychophysics of rapid categorization
Rapid categorization task (with mask to test feedforward model)
Animal present or not
30 ms ISI
20 ms
Image
Interval Image-Mask
Mask 1f noise
80 ms
Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005
Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)
Hierarchical feedforward models of the ventral stream
Decoding the neural code Matrix-like read-out from the brain
Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale
Hung Kreiman Poggio DiCarlo 2005
Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005
helliphelliphellip in 2013helliphellip
A model of the ventral stream in visual cortex (starting from work with Buelthoff and Logothetis)
Riesenhuber amp Poggio 1999 2000 Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005 Serre Oliva Poggio 2007
Database collected by Oliva amp Torralba
Psychophysics of rapid categorization
Rapid categorization task (with mask to test feedforward model)
Animal present or not
30 ms ISI
20 ms
Image
Interval Image-Mask
Mask 1f noise
80 ms
Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005
Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)
Hierarchical feedforward models of the ventral stream
Decoding the neural code Matrix-like read-out from the brain
Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale
Hung Kreiman Poggio DiCarlo 2005
Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005
helliphelliphellip in 2013helliphellip
Database collected by Oliva amp Torralba
Psychophysics of rapid categorization
Rapid categorization task (with mask to test feedforward model)
Animal present or not
30 ms ISI
20 ms
Image
Interval Image-Mask
Mask 1f noise
80 ms
Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005
Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)
Hierarchical feedforward models of the ventral stream
Decoding the neural code Matrix-like read-out from the brain
Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale
Hung Kreiman Poggio DiCarlo 2005
Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005
helliphelliphellip in 2013helliphellip
Rapid categorization task (with mask to test feedforward model)
Animal present or not
30 ms ISI
20 ms
Image
Interval Image-Mask
Mask 1f noise
80 ms
Thorpe et al 1996 Van Rullen amp Koch 2003 Bacon-Mace et al 2005
Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)
Hierarchical feedforward models of the ventral stream
Decoding the neural code Matrix-like read-out from the brain
Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale
Hung Kreiman Poggio DiCarlo 2005
Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005
helliphelliphellip in 2013helliphellip
Feedforward Models ldquopredictrdquo rapid categorization (82 model vs 80 humans)
Hierarchical feedforward models of the ventral stream
Decoding the neural code Matrix-like read-out from the brain
Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale
Hung Kreiman Poggio DiCarlo 2005
Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005
helliphelliphellip in 2013helliphellip
Decoding the neural code Matrix-like read-out from the brain
Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale
Hung Kreiman Poggio DiCarlo 2005
Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005
helliphelliphellip in 2013helliphellip
Agreement of model w| IT Readout data Reading out category and identity invariant to position and scale
Hung Kreiman Poggio DiCarlo 2005
Serre Kouh Cadieu Knoblich Kreiman amp Poggio 2005
helliphelliphellip in 2013helliphellip
helliphelliphellip in 2013helliphellip