semantics in vision v2 - harvard universityklab.tch.harvard.edu › publications › pdfs ›...

31
What do neurons really want? The role of semantics in cortical 1 representations 2 3 Gabriel Kreiman 4 Children’s Hospital, Harvard Medical School 5 [email protected] 6 7 Keywords: visual recognition, deep convolutional network, computational models, 8 ventral visual cortex, abstract meaning, categorization, neurophysiology 9 10 Number of Figures: 2 11 12 Abstract 13 What visual inputs best trigger activity for a given neuron in cortex and what 14 type of semantic information may guide those neuronal responses? We revisit the 15 methodologies used so far to design visual experiments, and what those methodologies 16 have taught us about neural coding in visual cortex. Despite heroic and seminal work 17 in ventral visual cortex, we still do not know what types of visual features are optimal 18 for cortical neurons. We briefly review state-of-the-art standard models of visual 19 recognition and argue that such models should constitute the null hypothesis for any 20 measurement that purports to ascribe semantic meaning to neuronal responses. While 21 it remains unclear when, where, and how abstract semantic information is 22 incorporated in visual neurophysiology, there exists clear evidence of top-down 23 modulation in the form of attention, task-modulation and expectations. Such top-down 24 signals open the doors to some of the most exciting questions today towards 25 elucidating how abstract knowledge can be incorporated into our models of visual 26 processing. 27 28 29 30 31

Upload: others

Post on 27-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Semantics in vision v2 - Harvard Universityklab.tch.harvard.edu › publications › PDFs › gk7786_temp.pdf9 ventral visual cortex, abstract meaning, categorization, neurophysiology

Whatdoneuronsreallywant?Theroleofsemanticsincortical1

representations2

3

GabrielKreiman4

Children’sHospital,HarvardMedicalSchool5

[email protected]

7

Keywords:visualrecognition,deepconvolutionalnetwork,computationalmodels,8

ventralvisualcortex,abstractmeaning,categorization,neurophysiology9

10

NumberofFigures:211

12

Abstract13

Whatvisualinputsbesttriggeractivityforagivenneuronincortexandwhat14

type of semantic information may guide those neuronal responses? We revisit the15

methodologiesusedsofartodesignvisualexperiments,andwhatthosemethodologies16

havetaughtusaboutneuralcodinginvisualcortex.Despiteheroicandseminalwork17

inventralvisualcortex,westilldonotknowwhattypesofvisualfeaturesareoptimal18

for cortical neurons. We briefly review state-of-the-art standard models of visual19

recognitionandarguethatsuchmodelsshouldconstitutethenullhypothesisforany20

measurementthatpurportstoascribesemanticmeaningtoneuronalresponses.While21

it remains unclear when, where, and how abstract semantic information is22

incorporated in visual neurophysiology, there exists clear evidence of top-down23

modulationintheformofattention,task-modulationandexpectations.Suchtop-down24

signals open the doors to some of the most exciting questions today towards25

elucidating how abstract knowledge can be incorporated into our models of visual26

processing.27

28

29

30

31

Page 2: Semantics in vision v2 - Harvard Universityklab.tch.harvard.edu › publications › PDFs › gk7786_temp.pdf9 ventral visual cortex, abstract meaning, categorization, neurophysiology

InthisChapter,Iaimtohighlightcriticallacunainourunderstandingofthe32

tuningpropertiesofvisualneurons,especiallytheroleofhigh-levelknowledgein33

neuralcodingofvisualinputs.Firstofall,Ishouldclarifytheobvious.Neuronsdo34

not“want”anything.Aneuronemitsanactionpotentialwhenitsintracellular35

voltageexceedsacertainthreshold,typicallybutnotexclusively,intheaxonhillock36

(Koch,1999).Thisvoltageisaweightedsumoftheinfluencesreceivedthroughthe37

neuron’sthousandsofdendriticinputs,whichincludebottom-up,horizontal,and38

top-downconnections.Itremainsexperimentallychallengingtotraceallincoming39

signalstoagivencorticalneuronandtopropagatethosesignalsallthewaybackto40

thesensoryinputs,nottomentionallothernon-sensoryinputs.Thus,inthevast41

majorityofcases,wecorrelatetheactivityofacorticalneuronwiththepresentation42

ofsensorysignals.Itisinthissensethatthequestioninthetitleshouldbe43

understood.Iaskwhatsensoryinputsbesttriggeractivityforagivencorticalneuron44

andwhattypeofsemanticinformationmayguideormodulatethoseneuronal45

responses.46

Istartwithasuccinctdescriptionoftheclassicalviewonwhattypesofvisual47

stimulitriggeractivityinneuronsalongtheventralvisualcortex.Iintroducestate-48

of-the-artstandardcomputationalmodelsofvisionandconsiderthemasabasic49

nullhypothesistoevaluateneuronaltuningpropertiesandpotentialsemantic50

influences,particularlyinthecontextofvisualcategorizationtasks.Next,Iprovidea51

fewexamplesofhowtop-downsignalscanmodulateresponsesalongtheventral52

visualcortexwhileemphasizingthatwehavealongwaytogotounderstandthe53

roleofcommonknowledgeonvisualprocessing.Iconcludebydiscussingcritical54

Hilbertquestioninthefieldandabriefglimpseoftheampleopportunitiesand55

challengesahead.56

57

Assumptionsanddefinitions58

Ifocushereontriggeringactivityinthesenseoffiringratesdefinedasspike59

countsinshortwindowsspanningtensofmilliseconds.Thisisbynomeanstheonly60

oragreeduponrelevantpropertyofcorticalneurons,therehasbeenextensive61

discussionaboutneuralcodes(seeforexample(Kreiman,2004)).Aneuronmight62

Page 3: Semantics in vision v2 - Harvard Universityklab.tch.harvard.edu › publications › PDFs › gk7786_temp.pdf9 ventral visual cortex, abstract meaning, categorization, neurophysiology

contributetorepresentinginformationbyfiringonlyafewspikesataprecisetime63

inconcertwithotherspikesinthenetwork.Aneuronmayalsorepresent64

informationbynotfiring,inthesamewaythatSherlockHolmesintuitedwhothe65

murdererwasbyattendingtothedogthatdidnotbark.Additionally,innon-66

invasivestudies,therearemultipleexperimentaltechniquesthatmeasurenon-67

neuronalsignalsthatarelesswellunderstoodandwhicharedifficulttointerpret68

directlyintermsofneuronalfiringrates.Thegeneralflavorofthediscussionhere69

couldwellbeextendedtootherneuralcodes,butintheinterestofsimplicitywe70

understandthequestioninthetitletoindicatewhattypeofsensoryinputsleadto71

highfiringratesforagivencorticalneuron.72

Afewassumptionsanddisclaimersarepertinentbeforeproceeding.Inorder73

toinvestigatewhatneuronswant,Irestrictthequestiontovisionandsolicit74

inspirationfrombiologicallyplausiblecomputationalmodelsofvision.Thefocuson75

visionismerelyapracticalone:(i)Weknowmoreaboutthearchitectureofthe76

visualsystemthanothersystems;(ii)Wecanstandontheshouldersofgiantsthat77

havepavedthewaythroughmorethanacenturyofbehavioralstudiesofvisionand78

overhalfacenturyofneurophysiologicalscrutinyofvisualcortex;(iii)Wehavean79

arsenaloftoolstosynthesizevisualstimuli,topreciselycontrolthetimingof80

presentation,tomeasureeyemovements,andtocapitalizeonmillionsofdigital81

imagesandvideos.Theemphasisonbiologicallyplausiblecomputationalmodelsof82

visionreflectstheneedtoformalizeourassumptions,andtogenerateacommon83

languagethatcanbeusedtodirectlytestourideasacrosslabsandacross84

experiments.Verbaldescriptionssuchas“neuronsinV2respondpreferentiallyto85

angles”or“neuronsinITrespondpreferentiallytoobjects”arenotsufficientand86

vague,lackpredictivepower,arehardtorejectorvalidate,andoftengetusinto87

trouble.Weneedmathematicalmodelsthatareinstantiatedintocomputercode88

wherewecanuseexactlythesameconditionsandexactlythesameimagesasin89

behavioralorneurophysiologicalexperiments.90

Beyondthepatternofinputspropagatedfromtheretinaontovisualcortex,91

whataneuronwantsislikelytobemodulatedbysemanticpriorknowledgeabout92

theworld,probablyconveyedthroughtop-downconnections.Whatexactlydowe93

Page 4: Semantics in vision v2 - Harvard Universityklab.tch.harvard.edu › publications › PDFs › gk7786_temp.pdf9 ventral visual cortex, abstract meaning, categorization, neurophysiology

meanbysemantics?TheOxford’sEnglishDictionarydefinessemanticsas“…the94

branchoflinguisticsandlogicconcernedwithmeaning.”Howthisdefinitionapplies95

tointerpretingtheresponsesofneuronsalongventralvisualcortexisnotclear.We96

attempttoprovideamorequantitativedefinitionlateroninthischapter.Forthe97

moment,asanexampleofsemanticsinthecontextofvisualrecognition,we98

understandpicturesofgrapes,oranges,orpineapplestorepresentdifferenttypesof99

fruits,eventhoughtheyareratherdifferentintermsoftheirvisualfeatures.100

Similarly,werefertoants,elephantsorgoldfishasanimals.Wewillaskwhatroles101

thisandothertypesofhigh-levelknowledgeabouttheworldplaysinvisual102

processing.103

104

Neuronalresponsesinvisualcortex,theclassicalview105

106

Theintroductionoftechniquestorecordtheactivityofneuronsinthe107

beginningofthelastcenturyledtodecadesofexperimentsinterrogatingneuronal108

responsestovisualstimulation.Thehistoryofstudyingneuronaltuningproperties109

invisualcortexisthehistoryofvisualstimuli.Howdoweinvestigatethefeature110

preferencesofaneuroninvisualcortex?Weneedtodecidewhichstimulitousein111

theexperiments.Thecentralchallengeinansweringthisquestionisthatitis112

essentiallyimpossiblewithcurrent(andforeseeable)technologytoexhaustively113

exploretheentirespaceofimages:thenumberofpossibleimagesisbeyond114

astronomical.Consideringasmallimagepatchof100x100pixels,thereare115

210,000~103,010possiblebinaryimages,~1024,082grayscaleimageswith256shadesof116

grayperpixel,and~1072,2478-bitcolorimages.Asaconsequence,investigators117

havetraditionallyusedseveralastuteandreasonablestrategiestoselectvisual118

stimuliforexperiments:119

(i)Stimulifrompreviousstudies.Pastperformanceisastrongpredictorofcurrent120

performanceforneurons.Stimulithathaveexcitedneuronsinpreviousstudiesare121

oftenagoodinitialguesstodesignexperiments.Forexample,eversincethe122

discoverythatV1neuronsincatsandmonkeysrespondvigorouslytobarsor123

gratingsofspecificorientations(HubelandWiesel,1968),investigatorshaveused124

Page 5: Semantics in vision v2 - Harvard Universityklab.tch.harvard.edu › publications › PDFs › gk7786_temp.pdf9 ventral visual cortex, abstract meaning, categorization, neurophysiology

orientedbarsandgratingstoprobetheresponsesessentiallyineveryspeciesand125

everyvisualarea(Coogan,1993;Chapmanetal.,1996;HegdeandVanEssen,2007;126

GhoseandMaunsell,2008;NiellandStryker,2010;Nassietal.,2014).Avariationof127

thisapproachistostartwitheffectivestimulifrompreviousstudiesandevaluate128

neuronalresponsestomodifiedversionsofthosestimuli(KobatakeandTanaka,129

1994;Tanaka,2003;Leopoldetal.,2006;Tsaoetal.,2006).130

(ii)Naturalstimuli.Itseemsreasonabletoassumethatneuronsrepresent131

behaviorallyrelevantstimuliandthetypesofimagesencounteredintherealworld.132

Thus,multiplestudieshaveprobedneuralresponsestonaturalimagesandmovies133

(OlshausenandField,1996;VinjeandGallant,2000;SimoncelliandOlshausen,134

2001;LesicaandStanley,2004;Okazawaetal.,2015;Isiketal.,2017),andalsoto135

everydayobjects(LogothetisandSheinberg,1996;SheinbergandLogothetis,2001;136

Hungetal.,2005;Liuetal.,2009),includingfaces(Desimoneetal.,1984;Allisonet137

al.,1994;Kanwisheretal.,1997;Tsaoetal.,2006).138

(iii)Semi-serendipitousfindings.HubelandWieselclaimedthattheydiscovered139

orientationtuningwhiletheywerescrutinizingtheactivityofprimaryvisualcortex140

neuronsandobservedtheresponseselicitedwhentheyinsertedaslideinthe141

projector(Hubel,1981).GrossandDesimoneobservedthatneuronsinITCfired142

vigorouslywhenoneoftheinvestigatorspassedinfrontofthemonkey,leadingto143

theinvestigationsaboutneuronsthatrespondtofacestimuli(Gross,1994).Our144

owndescriptionsofso-called“Clinton”or“Anniston”cellsinthehumanmedial145

temporallobewerealsofortuitous(Kreiman,2002;QuianQuirogaetal.,2005).146

Whiletheroleofluckcanbedebated,rigorousanalysesofneuralresponsestonovel147

stimulicanleadtodiscoveringunexpectedfeaturepreferences.148

(iv)Computationalmethods.Despiteenormousprogressindeveloping149

computationalmodelstoexplainandpredictneuralresponsesalongventralvisual150

cortex(RiesenhuberandPoggio,1999;Wuetal.,2006;Connoretal.,2007;Serreet151

al.,2007;DiCarloetal.,2012),therehavebeenfeweffortstousethosemodelsto152

createstimulithatefficientlydriveavisualneuron.Oneoftheseapproachesis153

reversecorrelationwherebyarapidsuccessionofwhitenoisestimuliispresented154

followedbyaveragingtheimagesprecedingspikes(Jonesetal.,1987).This155

Page 6: Semantics in vision v2 - Harvard Universityklab.tch.harvard.edu › publications › PDFs › gk7786_temp.pdf9 ventral visual cortex, abstract meaning, categorization, neurophysiology

approachhasbeensuccessfulinelucidatingthestructureofreceptivefieldsinthe156

retinaand,tosomeextent,inprimaryvisualcortex,butitdoesnotseemtoworkin157

highervisualareas,duetotheaccumulationofnon-linearitiesandalsothecurseof158

dimensionalitydictatedbythereducedsamplingofstimulusspace.Ratherthan159

startingfromnoise,exploitingnaturalstimulusstatisticshasbeenaproductiveway160

ofsynthesizingimagesandpredictingresponsesinareasV1(OlshausenandField,161

1996;OlshausenandField,2004),V2(Freemanetal.,2013),andV4(Okazawaetal.,162

2015).Anelegantalternativeapproachistouseageneticalgorithmwherebythe163

neuronunderstudycanitselfdictatewhichstimuliitprefers.Asuccessful164

implementationofthisideabyConnorandcolleagues(Yamaneetal.,2008)has165

beenusedtoinvestigateselectivityinmacaqueareasV4andITC(Carlsonetal.,166

2011;Hungetal.,2012;VaziriandConnor,2016).167

Usingacombinationofthesestimulusselectionapproaches,seminalstudies168

ledtofoundationaldiscoveriesaboutvisualprocessing,includingcenter-surround169

receptivefields(Kuffler,1953),neuronsinprimaryvisualcortexthataretunedto170

theorientationofabarplacedwithintheirreceptivefields(HubelandWiesel,171

1962),neuronsinareaMTthatdiscriminatemotiondirection(Movshonand172

Newsome,1992),neuronsinareaV4thataresensitivetocolors(Zeki,1983)and173

curvature(Gallantetal.,1993;PasupathyandConnor,2001),selectivitytonatural174

objects(LogothetisandSheinberg,1996;DiCarloetal.,2012)includingfaces175

(Desimoneetal.,1984;Tsaoetal.,2006),amongmanyothers.Despitethese176

extensiveandheroicefforts,westilldonotknowthatanyofthosetuningproperties177

areoptimalforthoseneurons–whereoptimalmeanstriggeringhighfiringrates.It178

isconceivablethattherecouldbeotherstimulithatmightmorestronglydrive179

neuronsinallthoseareas.Mechanisticmodelscanhelpusunderstandhow180

neuronalresponsesariseandthusdesignbetterstimuli.Thelasttwodecadeshave181

seensignificantprogressinthedevelopmentofcomputationalmodelstohelpus182

understandneuraltuningpropertiesinvisualcortex.183

184

Computationalmodelsofventralvisualcortex185

Page 7: Semantics in vision v2 - Harvard Universityklab.tch.harvard.edu › publications › PDFs › gk7786_temp.pdf9 ventral visual cortex, abstract meaning, categorization, neurophysiology

Inspiredbyneuroanatomyandneurophysiology,manyinvestigatorshave186

developedcomputationalmodelsthatcapturethebasicprinciplesthat187

progressivelytransformapixel-likerepresentationofinputsintocomplexfeatures188

thatcanbelinearlydecodedtorecognizeobjects(Fukushima,1980;Olshausenet189

al.,1993;Mel,1997;WallisandRolls,1997;RiesenhuberandPoggio,1999;Deco190

andRolls,2004;Serreetal.,2007;DiCarloetal.,2012).Morerecently,thisfamilyof191

modelshastakenoverthecomputervisioncommunityintheformofdeep192

convolutionalnetworkarchitecturesthatperformquitewellinmanyobjectlabeling193

andobjectdetectiontasks(Krizhevskyetal.,2012;SimonyanandZisserman,2014;194

Heetal.,2015;Heetal.,2018;Serre,2019).195

Whilethereareimportantvariationsacrossdifferentmodels,theyallshare196

basicdesignprinciplesandwegenericallyrefertothemasafamilyofmodels.The197

modelsarehierarchical,typicallyfollowingasequentialpathofoperations,198

mimickingtheapproximatelyhierarchicalnatureofventralvisualcortex(Felleman199

andVanEssen,1991).Themodelsconsistofmultiplelayers,followingadivide-and-200

conquerstrategybreakingtheproblemofobjectrecognitionintomultiplesmaller201

andsimplersteps.Eachofthesestepsischaracterizedbyaseriesofbiologically202

plausiblecanonicalcomputations,typicallyincludingafilterimplementedbyadot203

product,anormalizationstep,andamaxpoolingoperation.Inmostofthesteps,the204

operationsareperformedinaconvolutionalfashion,suchthatthesame205

computationisrepeatedthroughouttheentirevisualfield.Thedotproduct206

operationischaracterizedbysetsofweightsthatarelearntviatraining.Inthe207

computervisionliterature,aprominentwayoftrainingtheseweightsisvia208

supervisedlearningalgorithmsimplementingback-propagation.Throughthe209

sequenceofcomputations,unitsshowtuningforincreasinglymorecomplex210

features,accompaniedbyanincreasingdegreeofinvariancetotransformationsof211

thosefeaturessuchaschangesinposition,scale,etc.212

Thesemodelsperformquitewellinmanyobjectlabelingtasks.Forexample,213

theResNetarchitectureachieveda4%top-5errorrateintheImageNetdataset214

consistingof1,000possiblecategories(Heetal.,2015).Thesemodelsalsoprovidea215

reasonable–yetcertainlyimperfect–firstorderapproximationtocharacterize216

Page 8: Semantics in vision v2 - Harvard Universityklab.tch.harvard.edu › publications › PDFs › gk7786_temp.pdf9 ventral visual cortex, abstract meaning, categorization, neurophysiology

humanandmonkeybehavioralperformanceinrapidobjectcategorizationtasks217

(Russakovskyetal.,2014;Rajalinghametal.,2018).Forexample,arecentstudy218

showedthatdeepconvolutionalnetworkarchitecturesperformedaswellas,andin219

manycasesbetterthan,forensicfacialexaminerexperts,facialreviewersandso-220

calledfacial“super-recognizers”inafaceidentificationtask(Phillipsetal.,2018).221

Furthermore,theactivityofunitsinthesemodelscanbemappedontotheactivityof222

neuronsalongtheventralvisualcortex(Yaminsetal.,2014),evenextrapolating223

acrosscategorieswhenlearningthetransformationfrommodelstoneurons224

(O'Connelletal.,2018).225

Despitethemultiplesuccessesofthisfamilyofmodels,itisclearthatthey226

onlyscratchthesurfaceofwhatweneedtounderstandaboutvisualcortexand227

thereisalargeamountofneuroscienceandbehavioraldatathatcannotquitebe228

accountedbycurrentinstantiationsofthesealgorithms(Markovetal.,2014;229

Kubiliusetal.,2016;Ullmanetal.,2016;Linsleyetal.,2017;Tangetal.,2018;Serre,230

2019).Becausesuchmodelsdonotincorporateaspectsofcommonsensecognitive231

knowledgeabouttheworldotherthanwhatwasusedtolabelimagesfortraining,232

theyconstituteasuitablestandardbenchmarkandnullhypothesistocontrast233

againstforanystudythataimstoinvestigateanytypeofhigh-levelinformation234

encoding(Kreiman,2017).235

Withsomeexceptions,thisfamilyofmodelshasbeenlessconcernedwiththe236

rolesoftop-downinfluencesonventralvisualcortexresponses.Yet,thereis237

extensivedatadocumentinghowtop-downsignalscanmodulateneuronalactivity238

invision.Forexample,spatialattentioncanenhanceneuronalresponsesthroughout239

visualcortex(DesimoneandDuncan,1995;ReynoldsandHeeger,2009).Top-down240

influencesarealsomanifestedintheformofmodulationbytaskdemandsand241

expectations(GilbertandLi,2013). 242

Ofnote,thisfamilyofmodelsdoesnotexplicitlyincorporateanytypeof243

linguisticorsemanticencodingintheirdesign.Themodelsaretypicallytrainedto244

learntoseparateimagesthatwerelabeledasbelongingtodifferentclasses.For245

example,aninvestigatormaylabel1,000imagesaspineapples,andlabelanother246

setof1,000imagesaselephants.Themodelmaybetrainedviasupervisedlearning247

Page 9: Semantics in vision v2 - Harvard Universityklab.tch.harvard.edu › publications › PDFs › gk7786_temp.pdf9 ventral visual cortex, abstract meaning, categorization, neurophysiology

toseparatethosetwogroupsofimagesandthealgorithmscitedabovecandoa248

remarkablejobinlabelingimages,includingextrapolatingtonovelpicturesof249

pineapplesandelephants.Wenextturnourattentiontoaskwhetherthisabilityto250

assigncategorylabelsindicatesanytypeofsemanticrepresentation.251

252

Category-selectiveresponsesdonotimplysemanticencoding253

Inmanytypicalneuroscienceexperiments,investigatorsmaypresentimages254

containingobjectsbelongingtodifferentcategories(Desimoneetal.,1984;255

LogothetisandSheinberg,1996;Sugaseetal.,1999;Vogels,1999;Kreimanetal.,256

2000b;Freedmanetal.,2001;Thomasetal.,2001;SigalaandLogothetis,2002;257

Hungetal.,2005;QuianQuirogaetal.,2005;Tsaoetal.,2006;Kianietal.,2007;258

Meyersetal.,2008;Liuetal.,2009;KourtziandConnor,2011;Mormannetal.,259

2011).Throughoutinferiortemporalcortex,andeveninareasofthemedial260

temporallobeandpre-frontalcortex,investigatorshavereportedselectiveneuronal261

responseswithhigherfiringrateselicitedbysomegroupsofstimulicomparedto262

others.Dothesedifferentialresponsesindicateanytypeofsemanticencoding?263

Tobeclearaboutwhatthisquestionmeans,wereturntothedefinitionof264

semanticsasalinguistrepresentationconcernedwithmeaning.Weunderstandthis265

definitiontoimplythatmeaningindicatesanabstractrepresentation,beyondwhat266

ispurelycapturedbythestimulusfeatures.Asystemoralgorithmthat267

comprehendssemanticinformationshouldbeabletocapturethelinkbetween268

lemonsandpineapples,anditshouldbeabletodiscernthatatennisballis269

functionallyclosertoatennisracquet,eventhoughitlooksmoresimilartoalemon.270

Toinvestigatewhetherdistinctneuronalresponsestodifferentgroupsof271

stimulireflectsemanticencoding,weturntothenullhypothesisforvisual272

representationsoutlinedintheprevioussection,namely,computationalmodelsof273

objectrecognition.ConsiderthemodelarchitectureshowninFigure1A,consisting274

ofaninputimageconveyedtoacascadeof3convolutionallayers(Conv1-Conv3)275

andafullyconnected(fc)layerthatclassifiesinputimagesintooneof6possible276

categories.Thereare6fcunitsthatindicatetheprobabilitythattheimagebelongs277

toeachofthe6categories.Thisisclearlyafarcryfromstate-of-the-artmodelsthat278

Page 10: Semantics in vision v2 - Harvard Universityklab.tch.harvard.edu › publications › PDFs › gk7786_temp.pdf9 ventral visual cortex, abstract meaning, categorization, neurophysiology

includehundredsoflayersandcategorizehundredsofimages.Thedetailsofthe279

architecturearenottoorelevant;otherarchitecturesincludingstate-of-the-art280

computervisionmodelswouldproducesimilarresultstotheonesshownbelow.We281

deliberatelykeepitsimpleforillustrationpurposes,andtoprovidesourcecodethat282

caneasilyberanonanymachine(seelinksattheendoftheChapter).Thismodel283

wastrainedviaback-propagationusingimagesfrom6categoriesintheImageNet284

dataset(Russakovskyetal.,2014):biologicalcells(synsetnumbern00006484),285

Labradordogs(synsetnumbern02099712),fireants(synsetnumbern02221083),286

sportscars(synsetnumbern04285008),roses(synsetnumbern04971313),andice287

(synsetnumbern14915184).Examplesoftheseimagesareshowninthetoppartof288

Figure1B.Themodelwasabletoseparatethestimuluscategories:top-1289

performanceinacross-validatedsetwas78%(wherechanceis16.7%).A2D290

representationoftheactivationstrengthofthe6fcunitsatthetopofthemodelin291

responsetoeachoftheimagesisshowninFigure1B,usingadimensionality292

reductiontechniquecalledtSNEwhichmapsthe6dimensionaloutputvectoronto293

twodimensionsforvisualizationpurposes(vanderMaatenandHinton,2008).The294

colorsrepresentthe6differentcategories,whichclusterintooverlappingyet295

distinctgroups.Forexample,theimagesbelongingtothe“ice”category(pink)296

mostlyclusteredonthebottomleftwhileimagesbelongingtothe“rose”category297

(blue)mostlyclusteredonthetopinFigure1B.298

299

Wefurtherexaminedtheresponsesofeachofthe6fcunitstoallthe~8,000300

images(Figure1C).Forexample,intheleftmostcolumn,eachcirclecorrespondsto301

theactivationoffcunit1inresponsetooneoftheimages.Theverticaldottedlines302

separateimagesfromthe6differentcategories.Asexpected,basedonthewaythe303

modelwastrained,eachofthefcunitsshowedspecializationandrespondedmost304

stronglytooneoftheimagecategories.Forexample,fcunit1showedhigher305

activationonaveragetotheimagesfromthe“cell”category(red)comparedtoall306

theothercategories.Theresponseswerenotall-or-noneandshowedaconsiderable307

degreeofoverlapbetweencategories.Forexample,certainimagesofice(lastsetof308

images)yieldedstrongeractivationforfcunit1thansomeoftheimagesofcells309

Page 11: Semantics in vision v2 - Harvard Universityklab.tch.harvard.edu › publications › PDFs › gk7786_temp.pdf9 ventral visual cortex, abstract meaning, categorization, neurophysiology

(firstsetofimages,comparedthetwocircleshighlightedbythetwoarrowsforfc310

unit1inFigure1C).Thefcunitsarecategoryunitsparexcellence:byconstruction,311

theiractivationdictateshowthemodelwilllabelaparticularimage.Yet,the312

distributionoftheiractivationpatternsshowsconsiderableoverlapacross313

categoricalborders.Eventhoughthemodeldoesadecentjobatseparatingthe6314

imagecategories,themodeldoesnotseemtohaveanynotionofsemantics.A315

zoomedinpictureofapinkcarmaywellbemisclassifiedasarose.Andthediverse316

andstrangepatternsofcellshapescanoftenbemisconstruedtoindicateiceorants.317

Theproblemintermsofsemanticsisnotwiththemodelperformanceitself.Deeper318

modelsandmoreextensivetrainingcanleadtohigherperformance.Toerris319

algorithmic,afterall.Thepointhereisthatthemodelhasnosenseofabstract320

meaning,beyondthesimilarityofshapefeatureswithinacategoryrepresentedbyits321

units.322

323

Wecanstillrefertofcunit5asa“roseunit”forsimplicity.Whatwemeanby324

a“roseunit”isaunitthatismorestrongly--butnotexclusively--activatedby325

imagesthatcontainvisualshapefeaturesthatarecommoninthesetofrosesin326

ImageNet.Theunitdoesnotknowanythingsemanticaboutrosesandcanshow327

highactivationforimagesfromothercategoriesandalsolowactivationforimages328

containingroses,dependingonthevisualshapefeaturespresentintheimage.329

330

Acomparisonthatpervadestheliteratureisthedistinctionbetweenimages331

labeledas“humanfaces”andimageslabeledas“houses”.Wouldthemodelin332

Figure1Abeabletodiscriminatehumanfacesversushouses?Onemightimagine333

thatthemodelshouldnotbeabletodistinguishhumanfacesfromhousesbecause334

themodelwasnevertrainedwithsuchimages.Evenifoneweretotrytoarguethat335

themodelhassomesortofconcrete,asopposedtoabstract,understandingofthe336

meaningofcells,sportscars,roses,etc.,themodelshouldhavenoknowledge337

whatsoeverabouthumanfacesorhouses.Inotherwords,byconstruction,the338

modelhasnosemanticinformationaboutfacesorhouses.Ifthemodelcanstill339

separatefacesfromhouses,thenanysuchseparationcannotbebasedonsemantic340

Page 12: Semantics in vision v2 - Harvard Universityklab.tch.harvard.edu › publications › PDFs › gk7786_temp.pdf9 ventral visual cortex, abstract meaning, categorization, neurophysiology

knowledge.ToevaluatewhetherthemodelinFigure1Acanseparatepicturesof341

facesversushouses,weconsideredtwoadditionalcategoriesofimages:faces342

(synsetnumbern09618957),andhouses(synsetnumbern03545150).We343

extractedtheactivationpatternsofthe6fcunitsofthemodelinresponsetoeachof344

thosehumanfaceandhouseimageswithoutanyre-training(i.e.,themodelwas345

trainedtolabelthe6categoriesinFigure1Bandwemerelymeasuredthe346

activationinresponsetothesetwonewcategories).WeusedanSVMclassifierwith347

alinearkerneltodiscriminatepicturesofhumanfacesversushousesbasedonthe348

activityofthe6fcunits.Inotherwords,weaskedwhethertherepresentationgiven349

bythe“cellunit”,the“Labradorunit,the“fireantunit”,the“sportscarunit”,the350

“roseunit”,andthe“iceunit”wassufficienttoseparateimagesofhumanfacesand351

houses.Theclassifierachievedaperformanceof86%(wherechanceis50%).That352

is,thepatternofactivationofthe6fcunits–whicharespecializedtodiscriminate353

cells,Labradordogs,fireants,sportscars,roses,andice–canwellseparatepictures354

ofhumanfacesfromhouses.A2Drenderingoftheactivationpatternsofthose6fc355

unitsbythehumanfacesandhousesisshowninFigure1D,depictingagainaclear356

butcertainlynotperfectseparationofthetwocategories.357

358

Asystemthathasnosemanticknowledgeaboutfacesorhousescanstill359

separatethetwocategoriesquitewell.Giventheabundantliteratureonstudies360

aboutfacesversushouses,itisworthfurtherscrutinizingthisresult.The361

photographsintheImageNetdatasetaretakenfromthewebandtherearea362

handfulofhumanfacesandhousesincludedinthe6categorieschosenhere.The363

smallnumberofhumanfacesandhousesarenotuniformlydistributedamongthose364

6categoriesandcouldintroduceasmallbias.Yet,removingthosefewhumanfaces365

andhousesdoesnotchangetheresults.Aficionadostotheideathathumanfaces366

constituteaspecialgroupmightarguethattheimagesofLabradordogsdocontain367

animalfacesandthereforethe“Labrador”fcunitmayhelptheclassifierseparate368

facesfromhouses.Toevaluatethispossibility,wecomputedthesignaltonoiseratio369

foreachofthe6fcunitsindiscriminatingfacesversushouses.Thebestfcunitwas370

unitnumber4(theonethatshowedstrongeractivationbyimagesofsportscars),371

Page 13: Semantics in vision v2 - Harvard Universityklab.tch.harvard.edu › publications › PDFs › gk7786_temp.pdf9 ventral visual cortex, abstract meaning, categorization, neurophysiology

closelyfollowedbyunitnumber5(roses).Theworstfcunitwasunitnumber3(fire372

ants),followedbyunitnumber1(cells).Inotherwords,theLabradorfcunitisnot373

theonethatcontributesmosttotheseparationofhumanfacesversushouses.The374

activationpatternoffcunitnumber4(sportscars)inresponsestohumanfacesand375

housesisshowninFigure1E.Thisfcunitshowedaclearseparationofthetwo376

imagecategories,respondingstrongertoimagesofhumanfaces(meanactivation=377

0.47±1.72)comparedtohouses(meanactivation=-1.54±1.18).Aspointedout378

earlierinconnectionwithFigure1C,thedistributionofresponsesforthetwo379

categoriesclearlyoverlapped.380

381

Nowconsideranexperimentwithactualneuronsstudyingtheresponsesto382

imagesoffacesversushouses.Recordingtheactivityofaneuronthatbehavedlike383

fcunit4,inanexperimentsimilartotheoneinFigure1E,aninvestigatormightbe384

temptedtoarguethattheneuronrepresentsthesemanticconceptoffaces.Yetfc385

unit4isclearlymorestronglytunedtoimagesofsportscars(Figure1C,fourth386

subplot):themeanactivationinresponsetosportscarswas4.59±2.27,whichis387

abouttentimeslargerthanthemeanactivationinresponsetohumanfaces(0.47±388

1.72).Thereisnothingparticularlyspecialaboutthisunit;infact,allfcunitsexcept389

unitnumber3(fireants)showedastatisticallysignificantdifferentiationbetween390

imagesofhumanfacesversushouses.Tofurtherdispelanydoubtsthatthe391

Labradorimagesareplayinganyroleinhere,weranaseparatesimulationwhere392

wetrainedthesamearchitectureinFigure1Afromscratchwithonly2fcoutput393

unitstodiscriminateimagesofdesks(synsetnumbern03179701)versusimagesof394

friedrice(synsetnumbern07868340).Thealgorithmachievedanaccuracyof98%395

(chance=50%).Thesetwofcunitscouldbedescribedasa“deskunit”anda“fried396

riceunit”.Thepatternofactivationofthosetwofcunitsinresponsetoimagesof397

humanfacesandhouses(withoutanyretrainingofthenetwork)wasableto398

distinguishthemwith73%accuracy.Thedeskunitshowedanactivationof2.49±399

1.55inresponsetoimagesofhumanfacesandanactivationof0.98±1.11in400

responsetoimagesofhouses,clearlydifferentiatingthetwocategories.Thefried401

Page 14: Semantics in vision v2 - Harvard Universityklab.tch.harvard.edu › publications › PDFs › gk7786_temp.pdf9 ventral visual cortex, abstract meaning, categorization, neurophysiology

riceunitshowedanactivationof-2.32±1.37inresponsetoimagesofhumanfaces402

versusanactivationof-1.13±1.11forimagesofhouses,clearlydifferentiating403

betweenthetwocategories.Insum,measuringhigheractivationforpicturesofone404

categoryversusothers(e.g.sportscarsversusrosesorfacesversushouses),inand405

ofitself,shouldnotbetakentoimplyanytypeofsemanticrepresentation.406

407

OnemaystillwanttomaintainthatthefcunitsinFigure1Aencodesome408

flavorofsemantics.Afterall,athresholdedversionoftheactivityofthoseunitsis409

sufficienttoprovideacategoricalimagelabel.Furthermore,thoseunitsarecapable410

ofacertaindegreeofabstractioninthesensethattheycanlabelnovelimagesthat411

themodelhasneverseenbeforeintothose6categories.Suchaversionofsemantics412

couldperhapsbebestdescribedasconcretevisualshapesemantics,asopposedto413

someabstractversionofsemanticsthattranscendsvisualfeatures.414

415

Whatarethepreferredstimuliforvisualneurons?416

WhatdothosefcunitsinFigure1Aactuallywant?Thatis,whattypesof417

imageswouldtriggerhighactivationinthosefcunits?Weknowalreadyfrom418

Figure1Cthatimagesofcellsleadtohighactivationinfcunit1,imagesof419

Labradorsleadtohighactivationinfcunit2,etc.Therefore,itseemsreasonableto420

arguethatfcunit1“wants”imagesofcells,fcunit2“wants”imagesofLabradors421

andsoon.Onemightevengoontodescribefcunit2asa“Labradorunit”,aswe422

havebeendoing.Butisitpossiblethatthereexistotherimagesthatleadtoeven423

higheractivationofthosefcunits?Toinvestigatethisquestion,weusedtheAlexnet424

model(Krizhevskyetal.,2012),pre-trainedontheImageNetdataset(Russakovsky425

etal.,2014).Weconsideredtwooftheoutputunits(layerlabeledfc8inAlexnet).426

Thesameanalysescanbeperformedforanyotherlayerbutwefocusonthe427

classificationlayerbecausethisisthestagethatwouldpresumablycontainthe428

highestdegreeofcategoricalinformation.Forillustrationpurposes,weshowthe429

activationoffc8unitnumber209(Figure2A)andfc8unitnumber527(Figure2B)430

inresponsetofourcategoriesofstimuli:Labradors,fireants,desksandsportscars.431

Page 15: Semantics in vision v2 - Harvard Universityklab.tch.harvard.edu › publications › PDFs › gk7786_temp.pdf9 ventral visual cortex, abstract meaning, categorization, neurophysiology

Asexpectedbasedonthewaythatthemodelwastrained,the“Labradorunit”(unit432

209)showedlargeractivationforimagescontainingLabradorscomparedtothe433

otherimages(Figure2A).Similarly,the“Deskunit”(unit527)showedlarger434

activationforimagescontainingdeskscomparedtotheotherimages(Figure2B).435

ThisistheequivalentoftheresultspresentedinFigure1C.Next,weusedthe436

“DeepDream”algorithmtogenerateimagesthatleadtohighactivationforthosefc437

units(Mordvintsevetal.,2015).Essentially,theDeepDreamalgorithmusesthe438

networkinreversemode.Insteadofgoingfrompixelstothefeaturerepresentation439

inagivenunitinthenetwork,DeepDreamgoesfromthefeaturerepresentationina440

givenunitbacktopixels,generatingimagesasitsoutput,andoptimizingthose441

imagesineachiterationtoelicitahighactivationinthechosenunit.Using442

DeepDreamtogenerateimagesthatleadtohighactivationforthe“Labradorunit”443

producedtheimageshowninFigure2C.Theactivationstrengthofthe“Labrador444

unit”inresponsetotheimageinFigure2Cyieldedtheactivationstrengthshownby445

thegreenandbluesymbolsinFigure2A,dependingonthesize(thebluesymbol446

correspondstothesameexactsizeasalltheotherimages).TheimageinFigure2C447

thustriggeredhigheractivationthananyofthe1,846photographsofLabradors448

(eventhoughthosephotographswereusedtotrainthenetwork).Theimagein449

Figure2Ccouldwellbedescribedusingwordsbyahumanobserverascontaining450

multiplerenderingsofdistorted,sketchy,blurred,Labrador-likepatches.Similar451

resultsareshownforthe“Deskunit”inFigures2Band2D.Aftersomesquinting,it452

isalsopossibletodiscernsomeresemblancetodesk-likefeaturesinFigure2D,but453

itislessobvious.Insum,whatfcunitswantisanimagerenderingcomplexfeatures,454

featuresthatarenoteasilymappedontoEnglishwords,thoughtheycertainly455

resembleaspectsoftheactualphotographsusedtotrainthealgorithms.Thosefc456

unitsrespondmoststronglytoimagesthatcannotbeobviouslypredictedbythe457

labelsassignedtothem.Whileonemaystillwanttorefertothoseunitsas458

“Labradorunits”and“Deskunits”,itisclearthattherearemanyimagesthatwould459

notbelabeledasLabradorsordesksbyanyhumanobserver,andyettheytrigger460

higheractivationinthoseunits,evenhigherthanreal-worldphotographs461

containingthosecategories.462

Page 16: Semantics in vision v2 - Harvard Universityklab.tch.harvard.edu › publications › PDFs › gk7786_temp.pdf9 ventral visual cortex, abstract meaning, categorization, neurophysiology

Tosummarize,typicalNeuroscienceexperimentsarelimitedbyhowlongit463

ispossibletorecordfromaneuron.Investigatorsmustmakehardchoicesabout464

whichstimulitopresent.Thereisarichandexcitingliteraturewithmany465

experimentsshowingthatneuronalresponsescandiscriminateamongdifferent466

categoriesofstimuli.AsillustratedherebythecomputationalmodelsinFigures1467

and2,thesetypesofresponsesdonotimplyanytypeofsemanticencoding.Simple468

computationalmodelscanalsoyieldresponsesthatdistinguishdifferentcategories469

(Figure1B),thoseresponsesarenotall-or-none(Figure1C),category470

differentiationcanbedemonstratedusingunitsthatareknowntobeclearly471

semanticallyunrelatedtothosecategories(Figures1D-E),andcompleximagesthat472

donotdirectlymapontoanysemanticmeaningcantriggerhigheractivationin473

putativecategoricalunits(Figure2).474

475

Modelsversusrealbrains476

477

Thesedeepconvolutionalbottom-upcomputationalmodelscastadoubton478

claimsaboutsemanticencodingbasedoncategory-selectiveresponsesandprovide479

anullhypothesistocompareagainst.Yet,thesecomputationalmodelsareafarcry480

fromrealbiologicalsystemsinallsortsofwaysandthereforeitisfairtoquestionto481

whatextentwecanextrapolateconclusionsfromthesecomputationalmodelstothe482

typesofrepresentationsmanifestedbyrealneurons.Advocatesofsemanticswould483

rightlyarguethattheexercisesintheprevioussectionmerelyreflecttoymodels484

andthatitremainsunclearwhetherthesameobservationsapplytoactualneuronal485

recordingsfromrealbrains.Theobservationthatthesemodelscanreproduce486

certainaspectsofselectivityinneurophysiologicalrecordingsdoesnotimplythat487

onecanruleoutthepresenceofsemanticinformationinneuraldata.488

489

Althoughdeepconvolutionalmodelsarestillratherprimitiveandfailto490

incorporatemuchofthearchitectureandfunctionofbiologicalcircuits,recent491

studieshaveshownthatthesemodelscanexplainarelativelylargefractionofthe492

varianceinneuronalresponses(Yaminsetal.,2014;Maheswaranathanetal.,2018).493

Page 17: Semantics in vision v2 - Harvard Universityklab.tch.harvard.edu › publications › PDFs › gk7786_temp.pdf9 ventral visual cortex, abstract meaning, categorization, neurophysiology

Infact,category-selectiveresponsesfrombiologicalneuronsalsoshowthetypeof494

propertiesillustratedinFigure1(e.g.,(Vogels;Kreimanetal.;Sigalaand495

Logothetis;Hungetal.))andthereforethesamecautionarynotesshouldbeusedin496

interpretingneuronalselectivity.Furthermore,recentworkhasshownthatitis497

possibletogenerateeffectivestimuliforbiologicalneuronsinafashionsimilarto498

theprocedureillustratedinFigure2(Ponceetal.,2019).Theauthorsuseda499

proceduresimilartotheDeepDreamalgorithmdiscussedearliertogenerateimages500

whilerecordingneuronalresponsesusedtoguidetheevolutionofimagestriggering501

highfiringrates.Theresultingsetofsyntheticimagestriggeredactivationin502

biologicalneuronsthatwasasstrongasorinmanycasesevenstrongerthannatural503

stimuli,similartothesyntheticimagescreatedinFigure2.Inotherwords,for504

biologicalneuronsalongtheventralvisualcortex,thetypeofstimulithattrigger505

strongestactivationarenotrealworldobjectswithsemanticmeanings,butrather506

complexshapeswithfeaturessharedwithrealworldobjectsbutdistinctand507

abstractandwithoutanyobvioussemanticmeaning(Ponceetal.,2019).508

509

Absenceofcompellingevidenceforsemanticencodingdoesnotconstitute510

evidenceofabsenceofsemantics.Thefactthatwecannotconcludethatthereis511

abstractsemanticinformationbyobservingtheresponsestoagivencategory512

versusothersinthistypeofexperimentscertainlyshouldnotbeinterpretedto513

implythatsemanticinformationdoesnotexist.Thepointintheprevioussectionis514

thatmerelyshowingdifferentialpatternsofactivitybetweentwo(ormore)515

categoriesofstimuliismoreofareflectionaboutthechoiceofstimuliandaboutthe516

waytheimagesweregatheredratherthananymysteriousnotionofabstract517

meaning.Thefamilyofdeepconvolutionalnetworkmodelsshouldbeusedasanull518

hypothesisforanystatementconcerningtherepresentationofabstractmeaningin519

experimentsonvisualimages.Wecanthusdefineabstractsemanticencodingas520

visualdiscriminationsthatcannotbeaccountedforbythefamilyofnull521

computationalmodelsofvisualrecognition.522

523

Page 18: Semantics in vision v2 - Harvard Universityklab.tch.harvard.edu › publications › PDFs › gk7786_temp.pdf9 ventral visual cortex, abstract meaning, categorization, neurophysiology

Ratherthandiscussingpresenceorabsenceofsemanticinformationina524

binaryfashion,itisprobablymoreusefultoconsiderdifferentlevelsofabstraction525

andinvariance.Atthebottomlevelisthenotionoftemplatematching,i.e.aneuron526

thatrespondswhenaspecificcombinationofpixelsisshownwithinitsreceptive527

field.Increasingthedegreeofinvariance,wecanconsideraneuronthatresponds528

withapproximatelythesameintensitywhenthestimulusshowssmallchangessuch529

asacomplexcellinprimaryvisualcortexanditsresponsestoanoptimallyoriented530

baratdifferentpositionswithinthereceptivefield.Increasingthedegreeof531

abstraction,wecanconsiderneuronsininferiortemporalcortexthatshow532

tolerancetosomeamountof2Drotationoftheirpreferredstimuliandneuronsthat533

respondtovisuallysimilarexemplarsfromagivencategorysuchastheones534

modeledintheprevioussection.Asignificantstepupwardsininvariancewouldbe535

tofindneuronsthatshowasimilarresponsetoatennisball,atennisracquet,a536

tenniscourt,atennisskirtandthewordWimbledon.Tothebestofmyknowledge,537

thereisnoevidenceyetforsucharepresentation.538

539

Insearchofabstractioninthebrain540

541

Whattypeofexperimentaldatawouldprovideevidenceinfavorofabstract542

semanticinformation?Returningtotheexamplesusedinthedefinitionof543

semantics,itwouldbenicetoshowneuronalresponsesthataresimilarforatennis544

ballandatennisracketandyetverydifferentbetweenatennisballandalemon.In545

otherwords,itwouldbenicetoshow(i)imagesthathaveasimilarvisual546

appearance(e.g.,atennisballandalemon)andyettheytriggerverydifferent547

responses,and(ii)imagesthatarevisuallydissimilar(e.g.,atennisballandatennis548

racket)andyettheytriggerverysimilarresponses.549

Anelegantstepinthisdirectionwascarriedoutbygeneratingmorphs550

betweensyntheticimagesofcatsanddogsandtrainingmonkeystobehaviorally551

separatethem(Freedmanetal.,2001).Theauthorscouldtitratethevisual552

similarityofthestimuliandseparatepurelyvisualshapefeaturesfromthetask-553

relevantcategoricaldifferentiationbetweenthem.Theauthorsdescribedthe554

Page 19: Semantics in vision v2 - Harvard Universityklab.tch.harvard.edu › publications › PDFs › gk7786_temp.pdf9 ventral visual cortex, abstract meaning, categorization, neurophysiology

activityofneuronsinpre-frontalcortexthatcorrelatedwiththecategorical555

distinctionsratherthanthevisualappearancedistinctionsbetweenstimuli.While556

pre-frontalcortexneuronsbetterreflectedsuchtask-dependentabstract557

information,neuronsininferiortemporalcortexalsoshowedevidenceforencoding558

thecategoricalboundaries(Meyersetal.,2008).Furthermore,monkeyscouldbe559

retrainedtochangetheirdefinitionofthecategoricalboundariesandpre-frontal560

cortexneuronsalteredtheirtuningtoreflectthenewcategoricaldefinitions561

imposedbythetaskdemands(Cromeretal.,2010).562

Anothersetofintriguingresultscomesfromneuralrecordingsinhuman563

epilepsypatients.Somepatientssufferingfrompharmacologicallyintractable564

epilepsyareimplantedwithelectrodesaspartoftheclinicalprocedureforpotential565

surgicalresectionoftheseizurefocus.Thisclinicalsituationprovidesarather566

uniqueopportunitytorecordthespikingactivityofneuronsinthehumanbrain,567

particularlyinareasofthemedialtemporallobeincludingthehippocampus,568

entorhinalcortex,parahippocampalgyrusandtheamygdala(Engeletal.,2005;569

Kreiman,2007;MukamelandFried,2012;Friedetal.,2014).Thislineofresearch570

hasgeneratedobservationsleadingtoclaimsaboutcategoricalinvariance(Kreiman571

etal.,2000b;Mormannetal.,2011).Therehavealsobeenresponsestospecific572

individualsortospecificlandmarks(QuianQuirogaetal.,2005).Thesestudiesare573

subjecttothesametypeofcaveatshighlightedintheprevioussection.However,in574

severalofthosecases,theinvariantresponsesweretriggeredbysetsofimagesthat575

wereverydifferentfromeachotherbasedonvisualinspection(therewasno576

quantitativedocumentationofvisualshapesimilaritybasedoncomputational577

models).Thesubjectivevisualdissimilarityofthosestimulisuggeststhatitwould578

bedifficulttoaccountforthoseresponsespurelybasedonthetypeofvisual579

similaritydescribedbythenullfamilyofstandardmodels.Particularlystrikingare580

thecaseswheretheneuronsrespondedtotheimageofaparticularindividualas581

wellastextversionoftheirname(QuianQuirogaetal.,2005),andcaseswherethe582

neuronsrespondedinaselectivefashionduringvisualimageryintheabsenceof583

anyvisualinput(Kreimanetal.,2000a;Gelbard-Sagivetal.,2008).Theactivityof584

humanmedialtemporallobeneuronstakenasawholethereforereflectsahigh585

Page 20: Semantics in vision v2 - Harvard Universityklab.tch.harvard.edu › publications › PDFs › gk7786_temp.pdf9 ventral visual cortex, abstract meaning, categorization, neurophysiology

degreeofabstraction.Interestingly,theseresponsestendtooccurratherlateinthe586

game,arisingsomewherebetween200and300millisecondsafterstimulusonset587

dependingonthespecificarea,whichisattheveryleast50-150millisecondsafter588

theselectivevisualresponsesdescribedinbothmonkey(Eskandaretal.,1992;589

Keysersetal.,2001;Hungetal.,2005)andhuman(Liuetal.,2009)inferior590

temporalcortex.Additionally,bothhumans(Thorpeetal.,1996)andmonkeys591

(Fabre-Thorpeetal.,1998)canbehaviorallycategorizeimageswellbeforetheonset592

ofthoseresponses.Thus,thesemedialtemporalloberesponsesaremorelikelyto593

reflecttheencodingofemotionalinformationandtheformationofepisodic594

memories(bothofwhicharelikelytodependonsemanticencodingofinformation),595

ratherthanthevisualcategorizationperse.596

Accordingtothebroaddefinitionofsemanticsasaspectsofneuronal597

responsesthatcannotbeaccountedbythenullstandardmodelsofvisual598

recognition,multiplestudieshaveshowntaskdependentmodulationof599

neurophysiologicalresponsesthroughoutvisualcortex.Forexample,inanelegant600

study,neuronsinprimarycortexrespondeddifferentlytothesamestimuluswithin601

theirreceptivefielddependingonwhethertheinformationwasrelevantornotfor602

thecurrenttask(Lietal.,2004).Task-dependentexpectationscanalsomodulate603

responsesallthewaydowntoV1neurons(GilbertandLi,2013).Satisfyingsuch604

taskdemandscanbeconsideredanimportantaspectofabstractioninthesenseof605

consideringtheincominginputsinthecontextofcurrentgoals.606

607

Semanticsandtheleastcommonsense608

Commonsense,orgeneralsemanticknowledgeabouttheworld,ishardto609

find.Thedefinitionofsemanticsincludinglinguistic-likeinformation,atleasttaken610

literally,suggeststhatweshouldbelookingforahighlevelofabstraction,beyond611

whatcanbedescribedbycurrentvisualobjectrecognitionmodels.Onepractical612

issuetotacklesemanticsisthatitisdifficulttostudylanguageinnon-human613

animals.Strangely,thereareeveninvestigatorsthathaveclaimedthatlanguageis614

uniquetohumans(BerwickandChomsky,2015).Additionally,thereisminimaldata615

onsingleneuronresponsesinlanguageareasinthehumanbrain.Onemayimagine616

Page 21: Semantics in vision v2 - Harvard Universityklab.tch.harvard.edu › publications › PDFs › gk7786_temp.pdf9 ventral visual cortex, abstract meaning, categorization, neurophysiology

thatanylinguisticinformationfrommedialtemporallobestructures,fromtask-617

dependentrepresentationsinpre-frontalcortex,orfromlanguageareas,618

mayverywellpropagatebacktoventralvisualcorticalareasanditmightbe619

possibletodiscernthosetop-downsemanticinfluencesinvisualcortex.620

Inthespiritofstimulatingfurtherdiscussionsandfuturework,weconclude621

withabriefdesiderataofexperimentsandmodelstofurtherourunderstandingof622

whatvisualcorticalneuronsreallywantandtheroleofsemanticinformation.623

[1]Computationalmodelsshouldplayanintegralpartinthedesignofvisual624

experimentstoelucidatewhatneuronswant.Thefamilyofdeepconvolutional625

networkmodelsprovidesareasonablenullhypothesistostartwith.Themodelscan626

beusedtoquantifywhatfractionofneuronalresponsevariabilitycanbeexplained,627

butalsotogenerateimagesanddesigntheexperimentsthemselves.Asanexample628

ofthisapproach,Figure2illustratesawayinwhichamodelcangenerateimages629

thattriggerhighactivationinitsunitsanditwillbeinterestingtofurtherevaluate630

thislineofreasoninginneuronalrecordings.631

[2]Giventhelimitedamountofdatathatwecanacquireforagivenneurondespite632

laboriousandheroicexperiments,weshouldbeopentotheideathatwehaveyetto633

uncoverwhatneuronstrulywant.Ourunderstandinganddescriptionofthetuning634

propertiesofneuronsalongventralvisualcortexmayhavetobesignificantly635

revisited.Twoimportantrecentdevelopmentsmayaccelerateprogress:theadvent636

ofsophisticatedcomputationalmodelsthatcanprovidequantitativehypothesisfor637

testingbeyondclassicalexperimentaldesigns,andtheexperimentalpossibilityof638

holdingneuronalrecordingsforprolongedperiodsoftime(McMahonetal.,2014).639

[3]Taskdemandsseemtoplayacriticalroleindynamicallyshapingneuronal640

responsesbeyondthedimensionsthatarepurelydictatedbysensoryinputs.As641

oneexampleofarecentsurprisingfindinginthisdirection,neuronsinrodent642

primaryvisualcortexarestronglymodulatednotonlybythevisualinputsbutalso643

bythespeedatwhichtheanimalismoving(NiellandStryker,2010).Thereare644

plentyofopportunitiestofurtherinvestigatehowtop-downmodulationcan645

dynamicallyrouteinformationaccordingtothecurrentbehavioralgoals.646

Page 22: Semantics in vision v2 - Harvard Universityklab.tch.harvard.edu › publications › PDFs › gk7786_temp.pdf9 ventral visual cortex, abstract meaning, categorization, neurophysiology

[4]Touncoversemanticencoding,wewouldliketoensurethattheneuronal647

responsescannotbeexplainedbythenullfamilyofmodels.Aneuronencoding648

semanticinformationshouldshowasimilarresponsetoimagesthatsharemeaning649

butwhichhavenosimilarityintheirappearance.Additionally,suchaneuronshould650

showadifferentresponsetoimagesthatarevisuallysimilarbutdonotsharethe651

samemeaning.652

[5]Anotherimportantquestionforfutureresearchistoelucidatetheneuronal653

mechanismsofhowabstractioncanbelearnt.Therehasbeenextensivework654

showingthatvisualcorticalneuronscanchangetheirresponsesasaconsequenceof655

associationsformedbydifferentstimuli(Miyashita,1988;HiguchiandMiyashita,656

1996;Messingeretal.,2001;Suzuki,2007).Extendingsuchmechanismsmightlead657

totheformationofsemanticlinkssuchasthoseestablishedbythestatisticalco-658

occurrencesoftennisballs,racquets,courts,andskirts.659

660

Dataavailability661

AllthecodeusedtogenerateFigures1and2,isavailablefordownloadfrom:662

http://klab.tch.harvard.edu/resources/Categorization_Semantics.html663

WecannotprovidetheimagesusedintheexperimentsinFigures1and2.However,664

weprovidethesynsetidentificationnumbers,whichcanbeusedtofreelydownload665

alltheimagesfromthefollowingsite:666

http://image-net.org/667

668

669

670

Page 23: Semantics in vision v2 - Harvard Universityklab.tch.harvard.edu › publications › PDFs › gk7786_temp.pdf9 ventral visual cortex, abstract meaning, categorization, neurophysiology

FiguresCaptions671

672Figure1673

A.Simplemulti-layerconvolutionalnetworkconsistingofaninputlayer,3674convolutionallayersandafullyconnectedclassificationlayerthatclassifiesimages675intooneof6possiblecategories:cells,Labradors,fireants,sportscars,rosesandice676(exampleimagesfromthosecategoriesareshowninpartB).Thenetworkwas677trainedviabackpropagationtooptimizeclassificationofimagesbelongingtothose678

Page 24: Semantics in vision v2 - Harvard Universityklab.tch.harvard.edu › publications › PDFs › gk7786_temp.pdf9 ventral visual cortex, abstract meaning, categorization, neurophysiology

6categories.B.Dimensionalityreductionusingstochasticembedding(vander679MaatenandHinton,2008)oftheactivationpatternforthe6fclayerunitsfrompart680Ainresponsetoeachoftheimages.Thecolorofeachdotreflectstheimage681category.C.Activationstrengthforeachofthe6fcunitsinresponsetoallthe682images.Theimagecategoriesareseparatedbyverticaldottedlines.Theimages683fromthecategoryelicitingthestrongestactivationforeachofthefcunitsisshown684incolor,withthecolorsmatchingtheonesinpartB(e.g.,fcunit1showedstronger685activationtoimagescorrespondingtothecellcategory).D.Dimensionality686reductionusingstochasticembeddingoftheactivationpatternforthe6fclayer687unitsfrompartAinresponsetoimagesoffaces(red)orhouses(blue).Thenetwork688wasnottrainedtorecognizeeitherfacesorhouses.Yet,asupportvectormachine689classifierwithalinearkernelcouldseparatethetwocategories(emptycircles690representwronglyclassifiedimagesandfilledcirclesrepresentcorrectlyclassified691images).E.Activationpatternoffcunitnumber4(theoneshowingstrongest692responsestosportscars)inresponsetoalltheimagescontainingfaces(red)or693houses(blue).Thehorizontaldashedlineindicatestheaverageresponses.Allthe694parametersandsourcecodetogeneratetheseimagesareavailablein695http://klab.tch.harvard.edu/resources/Categorization_Semantics.html696697

698

Page 25: Semantics in vision v2 - Harvard Universityklab.tch.harvard.edu › publications › PDFs › gk7786_temp.pdf9 ventral visual cortex, abstract meaning, categorization, neurophysiology

699Figure2700

A.Activationofunitcorrespondingtochannel209inlayerfc8inAlexnet701(Krizhevskyetal.,2012)inresponseto1846imagesofLabradordogs(redcircles),702972imagesofants,1366imagesofdesks,and1165imagesofsportscars(black703circles).Theverticaldottedlinesseparatethedifferentimagecategories.This704neuralnetworkwastrainedviabackpropagationusing1000imagecategories,705includingthe4categoriesshownhere.Thechannelshownherecorrespondstothe706classificationunitforthelabel“Labradordog”;asexpected,activationforthose707imageswasgenerallylargerthanactivationforotherimages.B.SameasAforunit708correspondingtochannel527(Desk).C.ImagegeneratedusingDeepDreamfor709Alexnetchannel209inlayerfc8(Mordvintsevetal.,2015).D.Imagegenerated710usingDeepDreamforAlexnetchannel527inlayerfc8.TheimagesinC-Dledtothe711activationdenotedbythegreentrianglesinA-B.UponresizingtheimagesinC-Dto712bethesamesizeasalltheotherimagesinpartsA-B,thecorrespondingactivations713aretheonesshownbythebluesquaresinA-B.Alltheparametersandsourcecode714togeneratetheseimagesareavailablein715http://klab.tch.harvard.edu/resources/Categorization_Semantics.html716 717

Page 26: Semantics in vision v2 - Harvard Universityklab.tch.harvard.edu › publications › PDFs › gk7786_temp.pdf9 ventral visual cortex, abstract meaning, categorization, neurophysiology

References718

719

AllisonT,GinterH,McCarthyG,NobreAC,PuceA,LubyM,SpencerDD(1994)Face720recognitioninhumanextrastriatecortex.JournalofNeurophysiology72171:821-825.722

BerwickR,ChomskyN(2015)WhyOnlyUs:LanguageandEvolution.Cambridge,723MA:MITPress.724

CarlsonET,RasquinhaRJ,ZhangK,ConnorCE(2011)Asparseobjectcodingscheme725inareaV4.CurrentBiology21:288-293.726

ChapmanB,StrykerM,BonhoefferT(1996)DevelopmentofOrientationPreference727MapsinFerretPrimaryVisualCortex.JournalofNeuroscience16:6443-7286453.729

ConnorCE,BrincatSL,PasupathyA(2007)Transformationofshapeinformationin730theventralpathway.Currentopinioninneurobiology17:140-147.731

CooganT,Burkhalter,A.(1993)HierarchicalOrganizationofAreasinRatVisual732Cortex.TheJournalofNeuroscience13:3749-3772.733

CromerJA,RoyJE,MillerEK(2010)Representationofmultiple,independent734categoriesintheprimateprefrontalcortex.Neuron66:796-807.735

DecoG,RollsET(2004)ComputationalNeuroscienceofVision.OxfordOxford736UniversityPress.737

DesimoneR,DuncanJ(1995)Neuralmechanismsofselectivevisualattention.738AnnualReviewofNeuroscience18:193-222.739

DesimoneR,AlbrightT,GrossC,BruceC(1984)Stimulus-selectivepropertiesof740inferiortemporalneuronsinthemacaque.JournalofNeuroscience4:2051-7412062.742

DiCarloJJ,ZoccolanD,RustNC(2012)Howdoesthebrainsolvevisualobject743recognition?Neuron73:415-434.744

EngelAK,MollCK,FriedI,OjemannGA(2005)Invasiverecordingsfromthehuman745brain:clinicalinsightsandbeyond.NatRevNeurosci6:35-47.746

EskandarEN,RichmondBJ,OpticanLM(1992)Roleofinferiortemporalneuronsin747visualmemory.I.Temporalencodingofinformationaboutvisualimages,748recalledimages,andbehavioralcontext.JNeurophysiol68:1277-1295.749

Fabre-ThorpeM,RichardG,ThorpeSJ(1998)Rapidcategorizationofnatural750imagesbyrhesusmonkeys.Neuroreport9:303-308.751

FellemanDJ,VanEssenDC(1991)Distributedhierarchicalprocessinginthe752primatecerebralcortex.Cerebralcortex1:1-47.753

FreedmanD,RiesenhuberM,PoggioT,MillerE(2001)Categoricalrepresentationof754visualstimuliintheprimateprefrontalcortex.Science291:312-316.755

FreemanJ,ZiembaCM,HeegerDJ,SimoncelliEP,MovshonJA(2013)Afunctional756andperceptualsignatureofthesecondvisualareainprimates.Nature757neuroscience16:974-981.758

FriedI,CerfM,RutishauserU,KreimanG(2014)Singleneuronstudiesofthehuman759brain.Probingcognition.Cambridge,MA:MITPress.760

Page 27: Semantics in vision v2 - Harvard Universityklab.tch.harvard.edu › publications › PDFs › gk7786_temp.pdf9 ventral visual cortex, abstract meaning, categorization, neurophysiology

FukushimaK(1980)Neocognitron:aselforganizingneuralnetworkmodelfora761mechanismofpatternrecognitionunaffectedbyshiftinposition.Biological762Cybernetics36:193-202.763

GallantJL,BraunJ,VanEssenDC(1993)Selectivityforpolar,hyperbolic,and764Cartesiangratingsinmacaquevisualcortex.Science259:100-103.765

Gelbard-SagivH,MukamelR,HarelM,MalachR,FriedI(2008)InternallyGenerated766ReactivationofSingleNeuronsinHumanHippocampusDuringFreeRecall.767Science.768

GhoseGM,MaunsellJH(2008)Spatialsummationcanexplaintheattentional769modulationofneuronalresponsestomultiplestimuliinareaV4.Journalof770Neuroscience28:5115-5126.771

GilbertCD,LiW(2013)Top-downinfluencesonvisualprocessing.NatRevNeurosci77214:350-363.773

GrossCG(1994)Howinferiortemporalcortexbecameavisualarea.Cerebralcortex7745:455-469.775

HeK,ZhangX,RenS,SunJ(2015)Deepresiduallearningforimagerecognition.776arXiv1512.03385.777

HeK,GkioxariG,DollarP,GirshickR(2018)MaskR-CNN.In:IEEETrans.Pattern778Anal.MachIntell.779

HegdeJ,VanEssenDC(2007)Acomparativestudyofshaperepresentationin780macaquevisualareasv2andv4.Cerebralcortex17:1100-1116.781

HiguchiS,MiyashitaY(1996)Formationofmnemonicneuronalresponsestovisual782pairedassociatesininferotemporalcortexisimpairedbyperirhinaland783entorhinallesions.PNAS93:739-743.784

HubelD(1981)Evolutionofideasontheprimaryvisualcortex,1955-1978:abiased785historicalaccount.In:NobelLectures.786

HubelDH,WieselTN(1962)Receptivefields,binocularinteractionandfunctional787architectureinthecat'svisualcortex.TheJournalofphysiology160:106-788154.789

HubelDH,WieselTN(1968)Receptivefieldsandfunctionalarchitectureofmonkey790striatecortex.TheJournalofphysiology195:215-243.791

HungCC,CarlsonET,ConnorCE(2012)Medialaxisshapecodinginmacaque792inferotemporalcortex.Neuron74:1099-1113.793

HungCP,KreimanG,PoggioT,DiCarloJJ(2005)FastRead-outofObjectIdentity794fromMacaqueInferiorTemporalCortex.Science310:863-866.795

IsikL,SingerJ,MadsenJR,KanwisherN,KreimanG(2017)Whatischangingwhen:796Decodingvisualinformationinmoviesfromhumanintracranialrecordings.797Neuroimage:InPress.798

JonesJP,StepnoskiA,PalmerLA(1987)Thetwo-dimensionalspectralstructureof799simplereceptivefieldsincatstriatecortex.JNeurophysiol58:1212-1232.800

KanwisherN,McDermottJ,ChunMM(1997)Thefusiformfacearea:amodulein801humanextrastriatecortexspecializedforfaceperception.Journalof802Neuroscience17:4302-4311.803

KeysersC,XiaoDK,FoldiakP,PerretDI(2001)Thespeedofsight.Journalof804CognitiveNeuroscience13:90-101.805

Page 28: Semantics in vision v2 - Harvard Universityklab.tch.harvard.edu › publications › PDFs › gk7786_temp.pdf9 ventral visual cortex, abstract meaning, categorization, neurophysiology

KianiR,EstekyH,MirpourK,TanakaK(2007)Objectcategorystructureinresponse806patternsofneuronalpopulationinmonkeyinferiortemporalcortex.J807Neurophysiol97:4296-4309.808

KobatakeE,TanakaK(1994)Neuronalselectivitiestocomplexobjectfeaturesin809theventralvisualpathwayofthemacaquecerebralcortex.JNeurophysiol81071:856-867.811

KochC(1999)BiophysicsofComputation.NewYork:OxfordUniversityPress.812KourtziZ,ConnorCE(2011)NeuralRepresentationsforObjectPerception:813

Structure,Category,andAdaptiveCoding.AnnuRevNeursci34:45-67.814KreimanG(2002)Ontheneuronalactivityinthehumanbrainduringvisual815

recognition,imageryandbinocularrivalry.In:Biology.Pasadena:California816InstituteofTechnology.817

KreimanG(2004)Neuralcoding:computationalandbiophysicalperspectives.818PhysicsofLifeReviews1:71-102.819

KreimanG(2007)Singleneuronapproachestohumanvisionandmemories.820Currentopinioninneurobiology17:471-475.821

KreimanG(2017)Anullmodelforcorticalrepresentationswithgrandmothers822galore.Language,CognitionandNeuroscience32:274-285.823

KreimanG,KochC,FriedI(2000a)Imageryneuronsinthehumanbrain.Nature824408:357-361.825

KreimanG,KochC,FriedI(2000b)Category-specificvisualresponsesofsingle826neuronsinthehumanmedialtemporallobe.NatureNeuroscience3:946-827953.828

KrizhevskyA,SutskeverI,HintonG(2012)ImageNetClassificationwithDeep829ConvolutionalNeuralNetworks.In:NIPS.Montreal.830

KubiliusJ,BracciS,OpdeBeeckHP(2016)DeepNeuralNetworksasa831ComputationalModelforHumanShapeSensitivity.PLoSComputBiol83212:e1004896.833

KufflerS(1953)Dischargepatternsandfunctionalorganizationofmammalian834retina.JournalofNeurophysiology16:37-68.835

LeopoldDA,BondarIV,GieseMA(2006)Norm-basedfaceencodingbysingle836neuronsinthemonkeyinferotemporalcortex.Nature442:572-575.837

LesicaNA,StanleyGB(2004)Encodingofnaturalscenemoviesbytonicandburst838spikesinthelateralgeniculatenucleus.JournalofNeuroscience24:10731-83910740.840

LiW,PiechV,GilbertCD(2004)Perceptuallearningandtop-downinfluencesin841primaryvisualcortex.Natureneuroscience7:651-657.842

LinsleyD,EberhardtS,SharmaT,GuptaP,SerreT(2017)Whatarethevisual843featuresunderlyinghumanversusmachinevision?In:IEEEICCVWorkshop844ontheMutualBenefitofCognitiveandComputerVision.845

LiuH,AgamY,MadsenJR,KreimanG(2009)Timing,timing,timing:Fastdecoding846ofobjectinformationfromintracranialfieldpotentialsinhumanvisual847cortex.Neuron62:281-290.848

LogothetisNK,SheinbergDL(1996)Visualobjectrecognition.AnnualReviewof849Neuroscience19:577-621.850

Page 29: Semantics in vision v2 - Harvard Universityklab.tch.harvard.edu › publications › PDFs › gk7786_temp.pdf9 ventral visual cortex, abstract meaning, categorization, neurophysiology

MaheswaranathanN,KastnerDB,BaccusSA,GanguliS(2018)Inferringhidden851structureinmultilayeredneuralcircuits.PLoSComputBiol14:e1006291.852

MarkovNTetal.(2014)AWeightedandDirectedInterarealConnectivityMatrixfor853MacaqueCerebralCortex.Cerebralcortex24:17-36.854

McMahonDB,JonesAP,BondarIV,LeopoldDA(2014)Face-selectiveneurons855maintainconsistentvisualresponsesacrossmonths.Proceedingsofthe856NationalAcademyofSciencesoftheUnitedStatesofAmerica111:8251-8578256.858

MelB(1997)SEEMORE:Combiningcolor,shapeandtexturehistogrammingina859neurallyinspiredapproachtovisualobjectrecognition.NeuralComputation8609:777.861

MessingerA,SquireLR,ZolaSM,AlbrightTD(2001)Neuronalrepresentationsof862stimulusassociationsdevelopinthetemporallobeduringlearning.863ProceedingsoftheNationalAcademyofSciencesoftheUnitedStatesof864America98:12239-12244.Epub12001Sep12225.865

MeyersE,FreedmanD,KreimanG,MillerE,PoggioT(2008)DynamicPopulation866CodingofCategoryInformationinITCandPFC.JournalofNeurophysiology867100:1407-1419.868

MiyashitaY(1988)Neuronalcorrelateofvisualassociativelong-termmemoryin869theprimatetemporalcortex.Nature335:817-820.870

MordvintsevA,OlahC,TykaM(2015)DeepDream-acodeexampleforvisualizing871NeuralNetworks.In:GoogleResearch.MountainView:Google.872

MormannF,DuboisJ,KornblithS,MilosavljevicM,CerfM,IsonM,TsuchiyaN,873KraskovA,QuirogaRQ,AdolphsR,FriedI,KochC(2011)Acategory-specific874responsetoanimalsintherighthumanamygdala.Natureneuroscience87514:1247-1249.876

MovshonJA,NewsomeWT(1992)NeuralFoundationsofVisualMotionPerception.877CurrentDirectionsinPsychologicalScience1:35-39.878

MukamelR,FriedI(2012)Humanintracranialrecordingsandcognitive879neuroscience.Annualreviewofpsychology63:511-537.880

NassiJ,Gomez-LabergeC,KreimanG,BornR(2014)Corticocorticalfeedback881increasesthespatialextentofnormalizationFrontiersinSystems882Neuroscience8.883

NiellCM,StrykerMP(2010)Modulationofvisualresponsesbybehavioralstatein884mousevisualcortex.Neuron65:472-479.885

O'ConnellT,ChunMM,KreimanG(2018)Zero-shotneuraldecodingofbasic-level886objectcategory.In:Cosyne.Denver.887

OkazawaG,TajimaS,KomatsuH(2015)Imagestatisticsunderlyingnaturaltexture888selectivityofneuronsinmacaqueV4.ProceedingsoftheNationalAcademy889ofSciencesoftheUnitedStatesofAmerica112:E351-360.890

OlshausenB,FieldD(2004)Sparsecodingofsensoryinputs.Currentopinionin891neurobiology14:481-487.892

OlshausenBA,FieldDJ(1996)Emergenceofsimple-cellreceptivefieldproperties893bylearningasparsecodefornaturalimages.Nature381:607-609.894

Page 30: Semantics in vision v2 - Harvard Universityklab.tch.harvard.edu › publications › PDFs › gk7786_temp.pdf9 ventral visual cortex, abstract meaning, categorization, neurophysiology

OlshausenBA,AndersonCH,VanEssenDC(1993)Aneurobiologicalmodelofvisual895attentionandinvariantpatternrecognitionbasedondynamicroutingof896information.JournalofNeuroscience13:4700-4719.897

PasupathyA,ConnorCE(2001)ShaperepresentationinareaV4:position-specific898tuningforboundaryconformation.JNeurophysiol86:2505-2519.899

PhillipsPJ,YatesAN,HuY,HahnCA,NoyesE,JacksonK,CavazosJG,JeckelnG,900RanjanR,SankaranarayananS,ChenJC,CastilloCD,ChellappaR,WhiteD,901O'TooleAJ(2018)Facerecognitionaccuracyofforensicexaminers,902superrecognizers,andfacerecognitionalgorithms.Proceedingsofthe903NationalAcademyofSciencesoftheUnitedStatesofAmerica115:6171-9046176.905

PonceCR,XiaoW,SchadePF,HartmannTS,KreimanG,LivingstoneM(2019)906Evolvingsuperstimuliforrealneuronsusingdeepgenerativenetworks.907Biorxiv10.1101/516484.908

QuianQuirogaR,ReddyL,KreimanG,KochC,FriedI(2005)Invariantvisual909representationbysingleneuronsinthehumanbrain.Nature435:1102-1107.910

RajalinghamR,IssaEB,BashivanP,KarK,SchmidtK,DiCarloJJ(2018)Large-Scale,911High-ResolutionComparisonoftheCoreVisualObjectRecognitionBehavior912ofHumans,Monkeys,andState-of-the-ArtDeepArtificialNeuralNetworks.913JournalofNeuroscience38:7255-7269.914

ReynoldsJH,HeegerDJ(2009)Thenormalizationmodelofattention.Neuron91561:168.916

RiesenhuberM,PoggioT(1999)Hierarchicalmodelsofobjectrecognitionincortex.917NatureNeuroscience2:1019-1025.918

RussakovskyO,DengJ,SuH,KrauseJ,SatheeshS,MaS,HuangS,KarpathyA,Khosla919A,BernsteinM,BergA,Fei-FeiL(2014)ImageNetLargeScaleVisual920RecognitionChallenge.In:CVPR:arXiv:1409.0575,2014.921

SerreT(2019)Deeplearning:thegood,thebadandtheugly.AnnualReviewof922VisionInPress.923

SerreT,KreimanG,KouhM,CadieuC,KnoblichU,PoggioT(2007)Aquantitative924theoryofimmediatevisualrecognition.ProgressInBrainResearch165C:33-92556.926

SheinbergDL,LogothetisNK(2001)Noticingfamiliarobjectsinrealworldscenes:927theroleoftemporalcorticalneuronsinnaturalvision.Journalof928Neuroscience21:1340-1350.929

SigalaN,LogothetisN(2002)Visualcategorizationshapesfeatureselectivityinthe930primatetemporalcortex.Nature415:318-320.931

SimoncelliE,OlshausenB(2001)NaturalImageStatisticsandNeural932Representation.AnnualReviewofNeuroscience24:193-216.933

SimonyanK,ZissermanA(2014)Verydeepconvolutionalnetworksforlarge-scale934imagerecognition.arXiv1409.1556.935

SugaseY,YamaneS,UenoS,KawanoK(1999)Globalandfineinformationcodedby936singleneuronsinthetemporalvisualcortex.Nature400:869-873.937

SuzukiWA(2007)Makingnewmemories:theroleofthehippocampusinnew938associativelearning.AnnalsoftheNewYorkAcademyofSciences1097:1-11.939

Page 31: Semantics in vision v2 - Harvard Universityklab.tch.harvard.edu › publications › PDFs › gk7786_temp.pdf9 ventral visual cortex, abstract meaning, categorization, neurophysiology

TanakaK(2003)Columnsforcomplexvisualobjectfeaturesintheinferotemporal940cortex:clusteringofcellswithsimilarbutslightlydifferentstimulus941selectivities.Cerebralcortex13:90-99.942

TangH,LotterW,SchrimpfM,ParedesA,OrtegaJ,HardestyW,CoxD,KreimanG943(2018)Recurrentcomputationsforvisualpatterncompletion.PNAS.944

ThomasE,vanHulleM,VogelsR(2001)Encodingofcategoriesbynoncategory-945specificneurosnintheinferiortemporalcortex.JournalofCognitive946Neuroscience13:190-200.947

ThorpeS,FizeD,MarlotC(1996)Speedofprocessinginthehumanvisualsystem.948Nature381:520-522.949

TsaoDY,FreiwaldWA,TootellRB,LivingstoneMS(2006)Acorticalregion950consistingentirelyofface-selectivecells.Science311:670-674.951

UllmanS,AssifL,FetayaE,HarariD(2016)Atomsofrecognitioninhumanand952computervisionPNAS.953

vanderMaatenL,HintonG(2008)VisualizingHigh-DimensionalDataUsingt-SNE.J954MachineLearningRes9:2579-2605.955

VaziriS,ConnorCE(2016)RepresentationofGravity-AlignedSceneStructurein956VentralPathwayVisualCortex.CurrentBiology26:766-774.957

VinjeWE,GallantJL(2000)Sparsecodinganddecorrelationinprimaryvisual958cortexduringnaturalvision.Science287:1273-1276.959

VogelsR(1999)Categorizationofcomplexvisualimagesbyrhesusmonkeys:Part2:960single-cellstudy.EuropeanJournalofNeuroscience11:1239-1255.961

WallisG,RollsET(1997)Invariantfaceandobjectrecognitioninthevisualsystem.962PROGRESSINNEUROBIOLOGY51:167-194.963

WuMC,DavidSV,GallantJL(2006)Completefunctionalcharacterizationofsensory964neuronsbysystemidentification.AnnuRevNeurosci29:477-505.965

YamaneY,CarlsonET,BowmanKC,WangZ,ConnorCE(2008)Aneuralcodefor966three-dimensionalobjectshapeinmacaqueinferotemporalcortex.Nature967neuroscience11:1352-1360.968

YaminsDL,HongH,CadieuCF,SolomonEA,SeibertD,DiCarloJJ(2014)969Performance-optimizedhierarchicalmodelspredictneuralresponsesin970highervisualcortex.ProceedingsoftheNationalAcademyofSciencesofthe971UnitedStatesofAmerica111:8619-8624.972

ZekiS(1983)Colorcodinginthecerebralcortex-Thereactionofcellsinmonkey973visualcortextowavelengthsandcolors.Neuroscience9:741-765.974

975