a passive, camera-based head-tracking system for real-time...

INNOVATIVE METHODOLOGY

A passive, camera-based head-tracking system for real-time, three-dimensional estimation of head position and orientation in rodents

Walter Vanzella,1,3* Natalia Grion,1* Daniele Bertolini,1* Andrea Perissinotto,1,3 Marco Gigante,2 andX Davide Zoccolan1

1Visual Neuroscience Laboratory, International School for Advanced Studies (SISSA), Trieste, Italy; 2Mechatronics Lab,International School for Advanced Studies (SISSA), Trieste, Italy; and 3Glance Vision Technologies, Trieste, Italy

Submitted 13 May 2019; accepted in final form 22 September 2019

Vanzella W, Grion N, Bertolini D, Perissinotto A, Gigante M,Zoccolan D. A passive, camera-based head-tracking system for real-time, three-dimensional estimation of head position and orientation inrodents. J Neurophysiol 122: 2220–2242, 2019. First published Sep-tember 25, 2019; doi:10.1152/jn.00301.2019.—Tracking head posi-tion and orientation in small mammals is crucial for many applicationsin the field of behavioral neurophysiology, from the study of spatialnavigation to the investigation of active sensing and perceptualrepresentations. Many approaches to head tracking exist, but most ofthem only estimate the 2D coordinates of the head over the planewhere the animal navigates. Full reconstruction of the pose of the headin 3D is much more more challenging and has been achieved only inhandful of studies, which employed headsets made of multiple LEDsor inertial units. However, these assemblies are rather bulky and needto be powered to operate, which prevents their application in wirelessexperiments and in the small enclosures often used in perceptualstudies. Here we propose an alternative approach, based on passivelyimaging a lightweight, compact, 3D structure, painted with a patternof black dots over a white background. By applying a cascade offeature extraction algorithms that progressively refine the detection ofthe dots and reconstruct their geometry, we developed a trackingmethod that is highly precise and accurate, as assessed through abattery of validation measurements. We show that this method can beused to study how a rat samples sensory stimuli during a perceptualdiscrimination task and how a hippocampal place cell represents headposition over extremely small spatial scales. Given its minimal en-cumbrance and wireless nature, our method could be ideal for high-throughput applications, where tens of animals need to be simultane-ously and continuously tracked.

NEW & NOTEWORTHY Head tracking is crucial in many behav-ioral neurophysiology studies. Yet reconstruction of the head’s pose in3D is challenging and typically requires implanting bulky, electricallypowered headsets that prevent wireless experiments and are hard toemploy in operant boxes. Here we propose an alternative approach,based on passively imaging a compact, 3D dot pattern that, onceimplanted over the head of a rodent, allows estimating the pose of itshead with high precision and accuracy.

head tracking; perceptual discrimination; place cell; pose estimation

INTRODUCTION

Careful monitoring and quantification of motor behavior isessential to investigate a range of cognitive functions (such asmotor control, active perception, and spatial navigation) in avariety of different species. Examples include tracking eyemovements in primate and nonprimate species (Kimmel et al.2012; Payne and Raymond 2017; Remmel 1984; Stahl et al.2000; Wallace et al. 2013; Zoccolan et al. 2010), monitoringwhisking activity in rodents (Knutsen et al. 2005; Perkon et al.2011; Rigosa et al. 2017), and tracking the position of virtuallyany species displaying interesting navigation patterns—frombacteria (Berg and Brown 1972) and invertebrate species(Cavagna et al. 2017; Garcia-Perez et al. 2005; Mazzoni et al.2005; Mersch et al. 2013) to small terrestrial (Aragão et al.2011; Tort et al. 2006) and aerial mammals (Tsoar et al. 2011;Yartsev and Ulanovsky 2013) and birds (Attanasi et al. 2014).In particular, studies in laboratory animals aimed at measuringthe neuronal correlates of a given behavior require tools thatcan accurately track it in time and space and record it alongwith the underlying neuronal signals.

A classical application of this approach is to track theposition of a light-emitting diode (LED), mounted over thehead of a rat or a mouse, while recording the activity of placecells in hippocampus (O’Keefe and Dostrovsky 1971; O’Keefeand Nadel 1978) or grid cells in entorhinal cortex (Fyhn et al.2004; Hafting et al. 2005), with the goal of understanding howspace is represented in these brain structures (Moser et al.2008, 2015). It is also common, in studies of spatial represen-tations, to track the yaw of the head (i.e., its orientation in thehorizontal plane where the rodent navigates; see Fig. 2C), toinvestigate the tuning of neurons in hippocampus, entorhinalcortex, and other limbic structures for head direction, speed, orangular velocity (Acharya et al. 2016; Kropff et al. 2015;Sargolini et al. 2006; Taube 2007). In these applications, theyaw is tracked through an overhead video camera imaging twoLEDs of different colors (e.g., red and green), mounted overthe head stage of the neuronal recording system and placedalong the anteroposterior axis of the head. In some studies, thisLED arrangement was also used to estimate the pitch of thehead (i.e., its rotation about the interaural axis; see Fig. 2C), bymeasuring the distance between the two LEDs in the imageplane (Bassett and Taube 2001; Stackman and Taube 1998).

* W. Vanzella, N. Grion, and D. Bertolini contributed equally to this work.Address for reprint requests and other correspondence: D. Zoccolan, Inter-

national School for Advanced Studies (SISSA), Via Bonomea, 265, 34136Trieste, Italy (e-mail: [email protected]).

J Neurophysiol 122: 2220–2242, 2019.First published September 25, 2019; doi:10.1152/jn.00301.2019.

2220 0022-3077/19 Copyright © 2019 the American Physiological Society www.jn.org

Downloaded from journals.physiology.org/journal/jn at Univ of Arizona (150.135.165.107) on April 1, 2020.

It is more difficult (and only rarely it has been attempted) toachieve a complete estimate of the pose and location of thehead in the three-dimensional (3D) space—i.e., to simultane-ously track the three Cartesian coordinates of the head and thethree Euler angles that define its orientation: yaw, pitch, androll (with the latter defined as the rotation about the head’santeroposterior axis; see Fig. 2C). Recently, two groups havesuccessfully tracked the head of small, freely moving mam-mals in 3D through videography, by relying either on a singlecamera imaging a custom tetrahedral arrangement of fourLEDs with different colors (Finkelstein et al. 2015), or onmultiple cameras (up to four) imaging custom 3D arrange-ments of up to six infrared LEDs (Sawinski et al. 2009;Wallace et al. 2013). Other groups have used inertial measure-ment units (IMUs), such as accelerometers and gyroscopes,mounted over the head of a rat, to record its angular displace-ment and velocity along the three Euler rotation axes(Kurnikova et al. 2017; Pasquet et al. 2016).

All these approaches provide accurate measurements of headposition and pose in 3D. However, having been developed asad hoc solutions for specific experimental settings, their designis not necessarily optimal for every application domain. Forinstance, most of these systems were conceived to track thehead of small mammals roaming over an open-field arena,where the relatively large size of the custom LED- or IMU-based headset (extending several centimeters above and/oraround the animal’s head) was not an issue in terms ofencumbrance or obstruction. Moreover, these headsets need tobe powered to operate. In general, this requires dedicatedwires, which increase the stiffness of the bundle of cablesconnected to the headstage of the recording system and preventperforming fully unplugged recordings using headstagesequipped with wireless transmitters (Pinnell et al. 2016; Szutset al. 2011).

In this study, we tried to overcome these limitations, byusing a single overhead camera to passively image a 3D-printed structure, painted with a pattern of black dots over awhite background and mounted over the head of a rat. Thesmall size of the pattern (1.35!1.35!1.5 cm) makes it idealfor perceptual studies, where a rodent performs a discrimina-tion task inside a narrow operant box, often with its headinserted through an opening or confined within a funnel, as inthe studies of rodent visual perception recently carried out byour group (Alemi-Neissi et al. 2013; Djurdjevic et al. 2018;Nikbakht et al. 2018; Rosselli et al. 2015; Tafazoli et al. 2012;Zoccolan et al. 2009) and other authors (Bossens and Op deBeeck 2016; De Keyser et al. 2015; Hauser et al. 2019; Horneret al. 2013; Kemp and Manahan-Vaughan 2012; Kurylo et al.2015, 2017; Mar et al. 2013; Stirman et al. 2016; Vermaerckeand Op de Beeck 2012; Yu et al. 2018). In what follows,besides describing in details the equipment and the algorithmon which our method is based (MATERIALS AND METHODS) andvalidate its accuracy and precision (first part of the RESULTS andDISCUSSION), we provide a practical demonstration of the wayour head tracker can help understanding 1) how a rat samplesthe sensory stimuli during a visual or auditory discriminationtask; and 2) how hippocampal neurons represent head positionover extremely small spatial scales around the area where theanimal delivers its perceptual decision and collects the reward.

MATERIALS AND METHODS

Experimental rig for behavioral tests and in vivo electrophysiology.To demonstrate how our head tracker can be used in behavioralneurophysiology experiments with rodents, we applied it to track thehead of a rat engaged in a two-alternative forced-choice (2AFC)discrimination task (Zoccolan 2015; Zoccolan and Di Filippo 2018).The animal was also implanted with an electrode array to performneuronal recordings from the hippocampus.

The rig used to administer the task to the rat is similar to thatemployed in previous studies from our group (Alemi-Neissi et al.2013; Djurdjevic et al. 2018; Rosselli et al. 2015; Zoccolan et al.2009) and is illustrated in Fig. 1. Briefly, a custom-made operant boxwas designed with the CAD software SolidWorks (Dassault Sys-tèmes) and then built using black and transparent Plexiglas. The boxwas equipped with a 42-in. LCD monitor (Sharp, PN-E421) forpresentation of visual stimuli and an array of three stainless steelfeeding needles (Cadence Science), 10 mm apart from each otherconnected to three proximity sensors. The left and right feedingneedles were connected to two computer-controlled syringe pumps(New Era Pump Systems NE-500), for automatic pear juice delivery.Each feeding needle was flanked by a LED on one side and aphotodiode on the other side, so that, when the rat licked the needle,he broke the light bean extending from the LED to the photodiode andhis responses were recorded. The front wall of the operant box had arectangular aperture, which was 4 cm wide and extended vertically forthe whole height of the wall, so as to allow room for the cablesconnecting the implanted electrode array to the preamplifiers of theacquisition system. The rat learned to insert his head through thisaperture, so as to reach the feeding needles and face the stimulusdisplay, which was located 30 cm from his nose. Two speakerspositioned at the sides of the monitor were used for playing the soundstimuli. Stimulus presentation, input and output devices, and alltask-relevant parameters and behavioral data acquisition were con-trolled with the freeware, open-source software MWorks (https://mworks.github.io/) running on a Mac Mini (Apple; solid cyan andblack arrows in Fig. 1A).

During the neurophysiological experiments, stimulus presentationand collection of behavioral responses were synchronized with theamplification and acquisition of extracellular neuronal signals fromhippocampus (red arrow in Fig. 1A), performed using a System Threeworkstation (Tucker-Davis Technologies, TDT), with a sampling rateof 25 kHz, running on a Dell PC. Specifically, MWorks sent a codewith the identity and time of the stimulus presented in every trial tothe TDT system via a UDP connection (dotted black arrow). The TDTsystem also controlled the acquisition of the frames by the camera ofthe head tracker, by generating a square wave that triggered thecamera image uptake (dashed black arrow). The camera, in turn,generated a unique identification code for every acquired image. Thiscode was saved by the PC running the head-tracking software (alongwith the Cartesian coordinates and orientation of the head) and alsofed back to the TDT (dashed green arrow), which saved it in a datafile along with the neurophysiological and behavioral recordings. Thesquare wave had a fixed period and duty cycle, but it was adjustableby the user from the TDT graphical interface. In our experiments, theperiod was 23 ms (~50 Hz) and the duty cycle was around 33%.

Perceptual discrimination tasks, surgery, and neuronal recordings.One adult male Long-Evans rat (Charles River Laboratories) was usedfor the validation of the head-tracking system. The animal was housedin a ventilated cabinet (temperature controlled) and maintained on a10:14-h light-dark cycle. The rat weighed ~250 g at the onset of theexperiment and grew to over 500 g at the end of the study. The ratreceived a solution of water and pear juice (ratio 1:5) as a rewardduring each training session and, in addition, he had access to waterad libitum for 1 h after the training. All animal procedures were inagreement with international and institutional standards for the careand use of animals in research and were approved by the Italian

22213D HEAD TRACKING IN RODENTS

J Neurophysiol • doi:10.1152/jn.00301.2019 • www.jn.org


Ministry of Health: project N. DGSAF 22791-A, submitted on Sep. 7,2015 and approved on Dec. 10, 2015 (approval N. 1254/ 2015-PR).

The rat was trained in a visual and sound-recognition task in the rigdescribed in Experimental rig for behavioral tests and in vivo elec-trophysiology. The task required the animal to discriminate betweeneither two visual objects or two sounds (Fig. 5A). Each trial startedwhen the animal touched the central response port, which triggeredthe stimulus presentation. After the presentation of the stimulus on themonitor or the playback of the sound, the rat had to report its identityby licking either the left or the right response port. Each stimulus wasassociated to one specific port. Hence, only one action, either lickingthe left or right port, was associated to the reward in any given trial.Correct object identification was followed by the delivery of the pearjuice-water solution, while incorrect response yielded a 1- to 3-stimeout, with no reward delivery and a failure tone played along withthe flickering of the monitor from black to middle gray at 15 Hz. Thestimuli were presented for 1 s or until the animal licked one of thelateral response ports, independently of the correctness of the choice.

The visual stimuli consisted of a pair of three-lobed objects,previously used by our group in several studies of visual perceptualdiscrimination in rats (Alemi-Neissi et al. 2013; Djurdjevic et al.2018; Rosselli et al. 2015; Zoccolan et al. 2009). Specifically, eachobject was a rendering of a three-dimensional model built using theray tracer POV-Ray (http://www.povray.org). The sound stimuli weretwo pure tones, at 600 Hz and a 6,000 Hz, respectively (sound level:~55 dB). Sounds were delivered from two speakers located sym-metrically on both sides of the front part of the operant box, so thatthe sound level at the position of the animal’s ears was equal whenthe animal’s nose was near the central response port at the onsetof the trial.

Once the rat reached !70% correct behavioral performance, it wasimplanted with an electrode array for chronic recordings. To this aim,the animal was anesthetized with isofluorane and positioned in astereotaxic apparatus (Narishige, SR-5R). A craniotomy was madeabove the dorsal hippocampus and a 32-channel Zif-clip array (Tuck-

er-Davis Technologies Inc., was lowered into the craniotomy. Sixstainless steel screws were inserted into the skull (three anterior, onelateral, and two posterior to the craniotomy) to give anchoring to theimplant cementation. Around the implant, we put hemostatic gelatinsponge (Spongostan Dental, Ethicon, Inc.) saturated with sterile so-dium chloride solution to protect the brain from dehydration, and thensilicon (Kwik-Cast, World Precision Instruments) to seal the craniot-omy and protect it from the dental cement (Secure, Sun Medical Co.Ltd.) that was finally used to secure the whole implant to the skull.Hippocampal stereotaxic coordinates were "3.7 mm AP, "3.5 mmML. The final depth of the electrodes to target CA1 was around "2.2mm and for the CA3 subregion was around "3.4 mm. To place the dotpattern over the head of the rat for head tracking (see Head-trackingsystem), a complementary connector to the one mounted on the patternwas also cemented on the head, anterior to the electrode array (with amagnet on top; 210 g/cm2 holding force; see Fig. 2, A and B). The ratwas given antibiotic enrofloxacin (Baytril; 5 mg/kg) and carprofen(Rimadyl; 2.5 mg/kg, subcutaneous injection) for prophylaxis againstinfections and postoperative analgesia for the next 3 days postsurgery.The animal was allowed to recover for 7–10 days after the surgery,during which he had access to water and food ad libitum. Thebehavioral and recording sessions in the operant box were resumedafter this recovery period. Action potentials (spikes) in the extracel-lular signals acquired by the TDT system were detected and sorted foreach recording site separately, using Wave Clus (Quiroga et al. 2004)in MATLAB (The MathWorks). Online visual inspection of promi-nent theta waveforms in addition to histology confirmed the positionof the electrodes.

Head-tracking system. The head tracker (Fig. 1) consists of anindustrial monochromatic CMOS camera (Point Grey Flea3, 1.3 MPB&W, C-Mount), a far-red illuminator, a dedicated PC (Intel XeonHP Workstation Z620 with Xeon CPU 2.5 GHz, and RAM 16 GB),and a three-dimensional pattern of dots, mounted over the head of theanimal and imaged by the overhead camera (Fig. 2A). The sensor sizeof the camera was 1/2# and its resolution was 1,280 ! 1,024 pixels.

Fig. 1. Illustration of the experimental rig where perceptual discrimination tests were combined with in vivo neuronal recordings and real-time tracking of thehead of a rat subject. A: this schematic shows the operant box in which the rat was trained, along with its various components: the array of response ports (grayrectangle) for collection of behavioral responses, the stimulus display (orange rectangle), and the speakers for presentation of the visual and acoustic stimuli. Thedrawing also shows the computers used to run the behavioral task, record the neurophysiological signals, and acquire the video stream collected by the cameraof the head tracker (green box). The arrows show the flow of the signals collected and commands issued by the three computers (see MATERIALS AND METHODS

for details). TDT, System Three workstation from Tucker-Davis Technologies. B: CAD rendering of some of the key elements of the experimental rig. Thedrawing allows appreciating the relative size and position of the operant box (light gray), stimulus display (dark gray), camera (dark green), and illuminator (lightblue). The inset shows a detail of the 3D-printed block (cyan) holding the response ports and allows appreciating the size and typical position of the dot pattern,relative to the other components of the rig.

2222 3D HEAD TRACKING IN RODENTS



The camera mounted a TUSS Mega-pixel Fixed Focal Lens 2/3#, 50mm, with aperture size f/2.8. The illuminator was made of a matrix of4!4 LEDs, with dominant wavelength at 730 nm (OSLON SSL 150,PowerCluster LED Arrays) and a radiance angle of ["40°,40°] andwas powered at 100–150 mW. It was mounted right above thestimulus display, oriented toward the operant box with an angle of~50° with respect to the display (Fig. 1B). The camera was set in theexternal trigger mode and the triggers it received from the TDTsystem (dashed black arrow in Fig. 1A) were counted, so that eachframe was assigned a unique code, which was encoded in the first fourpixels of each acquired image. In our validation analyses and in ourtests with the implanted rat, the CMOS sensor integration time was setat 3 ms. This value was chosen after preliminary tests with theimplanted rat performing the discrimination task, in such a way toguarantee that the images of the dot pattern acquired during fastsweeps among the response ports were not blurred. The camera wasplaced above the stimulus display, oriented toward the operant boxwith an angle of ~50° with respect to the display (Fig. 1B), althoughsome of the validation measures presented in the RESULTS (Fig. 3) wereobtained with the camera mounted in such a way to have its opticalaxis perpendicular to the floor of the measurement area.

Obviously, following the generation of a trigger signal by the TDT,a propagation delay occurred before the camera started integrating forthe duration of the exposure time. This delay was ~5 "s and has beenmeasured as suggested by the camera constructor. That is, we con-figured one of the camera’s GPIO pins to output a strobe pulse and weconnected to an oscilloscope both the input trigger pin and the outputstrobe pin. The time delay from the trigger input to the completeimage formation was then given by the sum of the propagation delayand exposure time, where the latter, as explained above, was fixed at3 ms. Any other possible time delays like the sensor readout and thedata transfer toward the PC do not affect real-time acquisition, since

the content of the image and its frame number is entirely establishedat the end of the exposure time.

After the image has been read from the camera memory buffer, itbecomes available to the head-tracking software (see Overview of thehead-tracking algorithm) and, before any image processing occurs,the embedded frame number is extracted from the image and sentback to the TDT system via UDP protocol. Due to the data transferover the USB 3.0 and UDP connections, the frame number informa-tion reaches the TDT system with an unpredictable time distance fromthe signal that triggered the acquisition of the frame itself. However,synchronizing the triggering pulses and the frame numbers is straight-forward when considering the frame numbers and the number oftrigger pulses generated. To validate our synchronization method, weconducted a double check by comparing the synch by frame numberwith the synch by timestamps. The second method uses the timestampof the frame number received by TDT and looks for the trigger pulsethat could have generated the frame in an interval of time ["0.0855"0.0170 ms] before the frame number arrived. In particular, it assignsto the current frame number the more recent pulse that has not yetbeen assigned before. The two methods gave identical results overlong sessions.

The head tracker works by imaging a small (1.35!1.35!1.5 cm),lightweight (~1.10 g), easy-to-place 3D-printed structure (shown inFig. 2A), consisting of five coplanar black dots over a white back-ground (~3.5 mm apart from each other), plus a sixth dot located overan elevated pillar (~5 mm tall). The five coplanar dots were arrangedin a L-like shape, while the elevated dot was placed at the oppositeside with respect to the corner of the L shape, so as to minimize thechance of occluding the other dots. The 3D structure holding the dotswas designed with the CAD software SolidWorks (Dassault Sys-tèmes) and then 3D printed. The dots were printed on a standard whitepaper sheet, which was then cut to the proper size and glued over the

Fig. 2. Illustration of the 3D pattern of dots used for head tracking and of the Euler angles that define its pose in the camera reference system. A: CAD renderingof the pattern of dots that, once mounted over the head of a rat, allows its position and pose to be tracked. Notice the arm that allows the pattern to be attachedto a matching base surgically implanted over the skull of the rat. B: different views of a rat with the dot pattern mounted over his head. C: definition of the anglesof rotation of the reference system centered on the dot pattern (x=, y=, z=; purple arrows) with respect to the camera reference system (x, y, z; black arrows). Thethree Euler angles—yaw, pitch, and roll—are shown, respectively, by the red, green, and blue arrows. O= indicates the origin of the pattern reference system.The brown arrow indicates the direction where the rat’s nose is pointing and it is parallel to the head’s anteroposterior axis (i.e., to the x= axis). The dashed brownline is the projection of the brown arrow over the (x, y) plane of the camera reference system.




3D-printed structure. This structure also included an arm, with amagnet at the bottom, which allowed the pattern to be mounted overan apposite support (holding a complementary magnet) that wassurgically implanted on the rat’s head (Fig. 2, A and B).

Since the head-tracking algorithm (see Overview of the head-tracking algorithm) requires a precise knowledge of the spatial ar-rangement of the dots relative to each other, the distance between eachpair of dots, as well as the height of the pillar, were carefullymeasured using a caliper and by acquiring high-magnification imagesof the pattern along with a precision ruler placed nearby (resolution0.5 mm). This information is inserted in an apposite configuration fileand is part of the calibration data that are required to operate the head

tracker. The other calibration information is the internal cameraparameters K, which have to be precomputed by the camera calibra-tion procedure (see APPENDIX A). Once this information is known, itcan be stored and then loaded from the configuration file at thebeginning of each behavioral/recording session, provided that theposition of the camera in the rig is not changed and the same dotpattern is always used.

Overview of the head-tracking algorithm. Our head-tracking algo-rithm consists of three different functional parts: 1) the Point Detec-tion (PD) module; 2) the Points-Correspondences Identification (PcI)module; and 3) the Perspective-n-Point (PnP) module. These modulesare executed in a sequence at each frame captured by the camera. An

Fig. 3. Nominal precision of the head tracker in the camera reference system. A: CAD rendering of the breadboard with the grid of 5 ! 5 locations where thedot pattern was placed to obtain the validation measurements shown in D. The pattern was mounted over an apposite pedestal (dark gray structure) with fourfeet, which were inserted in matching holes over the breadboard. B: CAD rendering of the stereotax arm that was used to displace the dot pattern vertically, toobtain the validation measurements shown in E. C: CAD rendering of the stereotax arm that was used to change the pitch and roll angles of the dot pattern, toobtain the validation measurements shown in F. D, left: validation measurements (red dots) obtained by placing the dot pattern on a grid of 5 ! 5 ground-truthpositions (grid intersections), over the floor of the testing area, using the breadboard shown in A. Each red dot in the main panel is actually a cloud of 30 repeatedmeasurements, as it can be appreciated by zooming into the area of the grid intersection at the submillimeter scale (inset). Right: mean precision (top) andaccuracy (bottom) of the head-tracker measurements over the 25 tested positions (see RESULTS for details). E: top: validation measurements (red dots) obtainedby vertically displacing the dot pattern (relative to the floor of the testing area) of 34 consecutive increments, using the stereotax arm shown in B. Bottom: meanprecision (left) and accuracy (right) of the head-tracker measurements over the 36 tested vertical displacements. F, left: validation measurements (red dots)obtained by setting the roll and pitch angles of the dot pattern to a combination of 13 ! 9 ground-truth values (grid intersections), using the custom assemblyshown in C. The inset allows appreciating the spread of the 30 measurements taken at one of the tested angle combinations. Note that, for some extreme rotationsof the pattern, no measurements could be taken (grid intersections without red dots), since the dots on the pattern were not visible. Right: mean precision (left)and accuracy (right) of the head-tracker measurements over the set of tested angle combinations.




estimate of the position/pose of the dot pattern (and, therefore, of thehead) is computed, only when the PD module is able to extract thepositions of all the dots within the acquired image. In cases in whichsuch condition is not satisfied, the PD algorithm signals and recordsthe inability to successfully recover the position/pose of the pattern forthe current frame.

The PD module is a fast feature-detection algorithm that eliminatesfalse positive blobs and restricts the search space of point configura-tions by certain geometric constraints. Following the correct extrac-tion of the dot positions by the PD algorithm, such positions must beunivocally matched to the known geometry of the pattern by the PcIalgorithm. That is, all six points, as they appear in the image, must beassociated to the known 3D coordinates of the corresponding dots onthe 3D pattern. Finally, the position and pose of the pattern referencesystem (x=, y=, z=; purple arrows in Fig. 2C) with respect to the camerareference system (x, y, z; black arrows in Fig. 2C) is obtained bysolving the PnP problem. The whole algorithm was designed toprocess the images captured by the camera in real time, so as to outputthe estimated position and pose of the pattern without the need ofstoring the images for off-line processing. This required designing thethree different modules, in such a way to maximize both the speed ofprocessing and the accuracy of the estimates.

A detailed description of each processing module is provided inAPPENDIX A. The source code of the tracking algorithm, along with ashort, example video to test it, is freely available at the following link:https://doi.org/10.5281/zenodo.3429870.

RESULTS

Our head tracker was developed in the context of a 2AFCperceptual discrimination experiment, involving the visual andauditory modalities. A scheme of the experimental set up isshown in Fig. 1A. The rat performed the discrimination taskinside an operant box that was equipped with a monitor andtwo speakers for the presentation of the sensory stimuli. Thetask required the animal to insert the head through an openingin the wall facing the monitor and interact with an array ofthree response ports (i.e., three feeding needles, equipped withproximity sensors). Specifically, the rat had to lick the centralneedle to trigger the presentation of the stimulus. Afterward, hehad to lick one of the lateral needles to report his perceptualchoice and receive a liquid reward, in case of successfuldiscrimination (see MATERIALS AND METHODS for details).

Figure 1A also shows the equipment used to acquire theneurophysiological signals and image the dot pattern mountedover the head of the rat to perform head tracking (see MATERIALS

AND METHODS for a description). Figure 1B shows a CADdrawing with a scaled representation of the key components ofthe rig and their relative position: the operant box, the monitor,the block holding the feeding needles, the overhead camera,and the case of the light source.

The key element of our head-tracking systems is a 3Dpattern of dots, mounted over the head of the animal andimaged by the overhead camera (Fig. 2A). At the beginning ofeach experimental session, the pattern is mounted on top of thehead of the animal (Fig. 2B), using an apposite magnet thatconnects it to a base that was previously surgically implanted.As described in detail in the MATERIALS AND METHODS andAPPENDIX A, the head-tracking algorithm detects the six blackdots in the frames of the video stream and computes theposition and pose of the pattern in the 3D space of the camerareference system. More specifically, a pattern reference system(x=, y=, z=; purple arrows in Fig. 2C) is defined, with the originO= placed over one of the dots, the x= and y= axes parallel to the

edges of the plate, and the z= axis perpendicular to it (i.e.,parallel to the pillar). In the ideal case in which the pattern hadbeen precisely aligned to the anatomical axes of the head at thetime of the implant, x= corresponds to the head’s anteroposte-rior axis, while y= corresponds to the head’s interaural axis. Thecamera reference system (x, y, z; black arrows in Fig. 2C)results instead from calibrating the camera using a standardprocedure that consists in imaging a checkerboard patternplaced at various positions and orientations (see APPENDIX A).Once the camera is calibrated, the head-tracking algorithmprovides 1) the three Cartesian coordinates of O= in the camerareference system; and 2) the rotation matrix R that defines the3D rotation bringing the pattern reference system (x=, y=, z=) tobe aligned with the camera reference system (x, y, z). R isdefined as

R # Rz($c)Ry(%c)Rx(&c), (1)

where Rz, Ry, and Rx are the elemental rotation matrixes thatdefine intrinsic rotations by the Euler angles $C (yaw), %C(pitch), and &C (roll) about the axes of the camera referencesystem. More specifically:

Rz!$C" # #cos$C 'sin$C 0

sin$C cos$C 0

0 0 1$

Ry(%C) # #cos%C 0 sin%C

0 1 0

'sin%C 0 cos%C$

Rx(&C) # #1 0 0

0 cos&C 'sin&C

0 sin&C cos&C$ ,

where, with reference to Fig. 2C: $C is the angle between theprojection of x= onto the camera (x, y) plane and the camera xaxis; %C is the angle between x= and the (x, y) plane; and &C isthe rotation angle of the pattern around x=.

It should be noted that the three Euler angles $C, %C, and &C,as well as the three Cartesian coordinates of O=, are notimmediately applicable to know the pose and position of thehead in the environment. First, they are relative to the camerareference system, while the experimenter needs to know themwith respect to some meaningful environmental landmarks(e.g., the floor and walls of the arena or the operant box wherethe animal is tested). Second, no matter how carefully thepattern is placed over the head of the rat, in general, the (x=, y=,z=) axes will not be perfectly aligned to the anatomical axes ofthe head—e.g., x= and y= will not be perfectly aligned to thehead’s anteroposterior and interaural axes. However, express-ing the pose and position of the pattern in camera coordinatesallows measuring the nominal precision and accuracy that thehead tracker can achieve, which is essential to validate thesystem. In Validation of the head tracker: nominal precisionand accuracy in the camera reference system, we illustratehow we collected these validation measurements, and, in Op-eration of the head tracker: measuring displacements androtations relative to reference poses in the physical environ-ment, we explain how the actual position and pose of the headcan be expressed in a convenient reference system, by collect-




ing images of the checkerboard and dot pattern at, respectively,a reference position and pose.

Validation of the head tracker: nominal precision and ac-curacy in the camera reference system. To measure the preci-sion and accuracy of the head tracker, we used a customcombination of breadboards, linear stages, rotary stages, andgoniometers to hold the dot pattern in known 3D positions andposes (Fig. 3, A–C). This allowed comparing the ground-truthcoordinates/angles of the pattern with the measurements re-turned by the head tracker (Fig. 3, D–F). For these measure-ments, the camera was positioned in such a way to have itsoptical axis perpendicular to the floor of the testing area.

We first measured the ability of the system to track two-dimensional (2D) displacements of the pattern over the floor ofthe testing area. To this aim, the pattern was placed over a 3Dprinted breadboard with a grid of 5 ! 5 holes, that were 37.5mm and 25 mm apart along, respectively, the horizontal andvertical dimensions (Fig. 3A; the maximal error on the posi-tioning of the pattern was $0.15 mm and $0.10 mm along thetwo dimensions; see APPENDIX B). For each of the 25 bread-board locations, we took a set of 30 repeated, head-trackermeasurements of the origin of the pattern in the camerareference system (i.e., O= in Fig. 2C). Since the coordinates ofthe grid holes in such a reference system are not known apriori, a Procrustes analysis (Gower and Dijksterhuis 2004)was applied to find the optimal match between the set of25!30 measures returned by the head tracker and the known,physical positions of the holes of the breadboard. Briefly, theProcrustes analysis is a standard procedure to optimally aligntwo shapes (or two sets of points, as in our application) byuniformly rotating, translating, and scaling one shape (or oneset of points) with respect to the other. In our analysis, since wecompared physical measurements acquired with a calibratedcamera, we did not apply the scale transformation (i.e., thescale factor was fixed to 1). When applied to our set of 25!30measures, the Procrustes analysis returned a very good matchwith the set of 25 ground-truth positions of the grid (Fig. 3D,left). As shown by the virtually absent spread of the dots ateach intersection, the root mean square error (RMSE) of eachset of 30 measurements, relative to their mean, was very low,yielding a mean precision (across positions) of 0.056 $ 0.007mm along the x-axis and 0.037 $ 0.005 mm along the y-axis(Fig. 3D, right; top bar plot). The RMSE of each set of 30measurements, relative to the corresponding grid intersection,was also very low, yielding a mean accuracy (across positions)of 0.663 $ 0.079 mm along the x-axis and 0.268 $ 0.041 mmalong the y-axis (Fig. 3D, right; bottom bar plot).

Interestingly, zooming into the area of the grid intersectionsat the submillimeter scale (see Fig. 3D, inset) revealed that thespread of the position estimates was elongated along a specificdirection in the (x,y) plane. This can be explained by the factthat the PnP algorithm, when it finds the position/pose of thepattern that yields the best match between the projection of thedots on the camera sensor and their actual position, as seg-mented from the acquired image (see APPENDIX A), naturallytends to make larger errors along the direction of sight of thecamera (i.e., the axis connecting the camera’s sensor to thepattern). In fact, small changes along this direction wouldresult in projected images of the pattern that do not differ muchamong each other (in other words, this is the direction alongwhich less information can be inferred about the position of the

pattern). Hence, the cloud of repeated position estimates wouldmainly spread along this direction. When these estimates areprojected onto the (x,y) plane, the spread will still appearelongated in a given direction (as shown in the inset of Fig.3D), depending on the relative position of the camera and thepattern’s location (only if the pattern were located right belowthe camera, perpendicular to its optic axis, the clouds ofmeasurements would be circularly symmetric, since, in thiscase, the measurements would spread along the z axis). Itshould also be noted that an elongated spread in the (x,y) planeautomatically implies a difference in the accuracy of the x andy estimates. For instance, in the case of our measurements, theRMSE in x was about three times larger than in y (Fig. 3D,right; bottom bar plot), which is consistent with the largerspread of the measurements along the x-axis, as compared withthe y-axis, that is apparent in the inset.

To verify the ability of the head tracker to estimate theheight of the pattern (i.e., its displacement along the z-axis)when it was varied over a range of positions, the pattern wasmounted on the arm of a stereotax (MyNeuroLab Angle One)through a 3D-printed custom joint (Fig. 3B). The arm waspositioned in such a way to be perpendicular to the floor ofthe testing area, thus allowing the pattern to be displacedvertically of 34 consecutive 2-mm increments, with a reso-lution of 0.01 mm. After each increment, the z coordinate ofthe pattern was estimated by the head tracker in 30 repeatedmeasurements, which were then compared with the totalphysical displacement of the stereotaxic arm up to that point(Fig. 3E, top). The resulting estimates were again veryprecise and accurate (Fig. 3E, bottom), with RMSE valuesthat were close to those obtained previously for the x and ydisplacements (compare the z bars of Fig. 3E to the x and ybars of Fig. 3D).

For the validation of the angular measurements, we built acustom assembly, made of a rotary stage (Thorlabs MSRP01/M)mounted on a stereotaxic arm (MyNeuroLab Angle One), thatallowed rotating the pattern about two orthogonal axes, corre-sponding to roll and pitch (Fig. 3C). Rotations about each axiswere made in 10° steps, spanning from "60° to 60° roll anglesand from "40° to 40° pitch angles, while the yaw was kept fixat 0° (the resolution of the stereotaxic arm was 0.2°, while therotary stage had an accuracy and precision of, respectively,0.13° and 0.04°, measured as RMSE see APPENDIX B). Again,30 repeated head-tracker measurements were collected at eachknow combination of angles over the resulting 13 ! 9 grid. Asfor the case of the 2D displacements, the angles returned by thehead tracker were not immediately comparable with the nom-inal rotations on the rotary stages, because the two sets ofangles are measured with respect to two different referencesystems—i.e., the camera reference system (as defined in Fig.2C), and the stage reference system (as defined by the orien-tation of the pattern in the physical environment, when therotations of the stages are set to zero). Therefore, to comparethe nominal rotations of the pattern on the stages (Rstages

nom ) totheir estimates provided by the head tracker (Rcam

est ), we first hadto express the former in the camera reference system (Rcam

nom). Tothis aim, we followed the same approach of Wallace et al.(2013) and we computed Rcam

nom as




Rcamnom # Rstage

cam · Rstagesnom · (Rstage

cam )T · Rcamnom 0, (2)

where Rcamnom 0 is the pose of the pattern (in the camera reference

system) when all the rotations of the stages are nominally setto zero (i.e., reference rotation), and Rstage

cam is the matrix map-ping the stage reference system into the camera referencesystem (note that each matrix in Eq. 2 is in the form shown inEq. 1). The matrixes Rstage

cam and Rcamnom 0 are unknown that can be

estimated by finding the optimal match between the nominaland estimated rotations in camera coordinates, i.e., betweenRcam

nom, as defined in Eq. 2, and Rcamest . Following Wallace et al.

(2013), we defined the rotation difference matrix Rdiff #Rcam

est ·RcamnomT, from which we computed the error of a head-

tracker angle estimate as the total rotation in Rdiff, i.e., as

err # cos'1% trace!Rdiff" ' 1

2 & .

By minimizing the sum of err2 over all tested rotations of thestages (using MATLAB fminsearch function), we obtained avery close match between estimated and nominal stage rota-tions. This is illustrated in Fig. 3F, left, where the red dots arethe head-tracker estimates and the grid intersections are thenominal rotations. As for the case of the Cartesian displace-ments, also for the pitch and roll angles, the head trackerreturned very precise (roll: RMSE % 0.095 $ 0.005°; pitch:RMSE % 0.109 $ 0.006°) and accurate (roll: RMSE % 0.557 $0.045°; pitch: RMSE % 0.381 $ 0.030°) measurements (Fig.3F, right).

Operation of the head tracker: measuring displacements androtations relative to reference poses in the physical environment.While the validation procedure described in Validation of thehead tracker: nominal precision and accuracy in the camerareference system provides an estimate of the nominal precisionand accuracy of the head tracker, measuring Cartesian coordi-nates and Euler angles in the camera reference system isimpractical. To refer the head-tracker measurements to a moreconvenient reference system in the physical environment, we3D printed a custom adapter to precisely place the checker-board pattern used for camera calibration over the blockholding the feeding needles, in such a way to be parallel to thefloor of the operant box, with the vertex of the top-right blacksquare vertically aligned with the central needle (see Fig. 4A).We then acquired an image of this reference checkerboard,which served to establish a new reference system (x#, y#, z#),where the x# and y# axes are parallel to the edges at the base(the floor) of the operant box, while z# is perpendicular to thefloor and passes through the central feeding needle. The posi-tion measurements returned by the head tracker can be ex-pressed as (x#, y#, z#) coordinates by applying the rotationmatrix Rbox

cam and the translation vector t!boxcam that map the camera

reference system into this new operant box reference system.Rbox

cam is in the form shown in Eq. 1, but with the angles referringto the rotation of the reference checkerboard with respect to thecamera reference system; t!box

cam # '(tx (ty (tz (, where &t isthe distance between the origins of the two reference systemsalong each axis.

We tested the ability of the head tracker to correctly recoverthe Cartesian coordinates of the dot pattern relative to the boxreference system, by placing the pattern over the 5!5 grid

shown in Fig. 3A and collecting 30 repeated head-trackermeasurements at each location. To verify the functioning of thehead tracker under the same settings used for the behavioraland neurophysiological experiments (see Head movements of arat performing a two-alternative forced choice task in thevisual and auditory modalities), the camera was not centeredabove the operant box, with the optical axis perpendicular tothe floor, as previously done for the validation shown in Fig. 3.Otherwise, during a neuronal recording session, the cable thatconnects the headstage protruding from the rat head to thepreamplifier would partially occlude the camera’s field ofview. Hence, the need to place the camera in front of the rat,above the stimulus display, oriented with an angle of ~50°relative to the floor (see Fig. 1B). This same positioning wasused here to collect the set of validation measurements over the5!5 grid. As shown in Fig. 4C, the match between estimatedand nominal horizontal (x#) and vertical (y#) coordinates of thepattern was very good (compare the red dots to the gridintersection), with a barely appreciable dispersion of the 30measurements around each nominal position value. This re-sulted in a good overall precision (x: RMSE % 0.034 $ 0.004;y: 0.158 $ 0.010) and accuracy (x: RMSE % 0.403 $ 0.050; y:1.36 $ 0.093) of the x# and y# measurements.

The Euler angles defining the pose of the head in the 3Dspace could also be measured relative to the operant boxreference system (x#, y#, z#). However, this would yield anestimate of the rotation of the dot pattern, rather than of the rathead, in the physical environment. In fact, no matter howcarefully the support holding the pattern is implanted at thetime of the surgery, it is unlikely for the pattern to be perfectlyaligned to the anatomical axes of the head, once put in place.In general, the x= and y= axes of the pattern reference system(see Fig. 2B) will be slightly titled, with respect to the head’santeroposterior and interaural axes. Therefore, it is more con-venient to acquire an image of the pattern when the rat isplaced in the stereotax, with its head parallel to the floor of theoperant box, and then use such a pose zero of the pattern as areference for the angular measurements returned by the headtracker. This can be achieved by defining a pose zero referencesystem (x=0, y=0, z=0) and a rotation matrix Rpose 0

cam mapping thecamera reference system into this new coordinate system [thematrix is in the form shown in Eq. 1, but with the anglesreferring to the rotation of the pose zero of the pattern withrespect to the camera reference system].

To test the ability of the head tracker to correctly recover theEuler angles of the dot pattern relative to the pose zeroreference system, we mounted the pattern over a customassembly made of two 3D printed goniometers, each allowingrotations over a span of 70° (from "35° to '35°), and a rotarystage, enabling 360° rotations (Fig. 4B). This allowed settingthe pose of the pattern over a grid of known 3!7!5 combi-nations of yaw, pitch, and roll angles (the accuracy andprecision of such ground-truth rotations were 0.07° and 0.02°for the yaw; 0.05° and 0.03° for the pitch; and 0.13° and 0.04°for the roll; all measured as RMSE see APPENDIX B). Asillustrated in Fig. 4D, left, we found a good match between thenominal and estimated pattern rotations (note that not all3!7!5 angle combinations were actually tested, since the dotson the pattern were not detectable at some extreme rotations).When mediated across all tested angle combinations, the re-




sulting precision and accuracy (Fig. 4D, right) were verysimilar to the nominal ones shown in Fig. 3F (precision: roll0.068 ° $ 0.003°, pitch 0.076° $ 0.004°, yaw 0.0409° $ 0.001°;accuracy: roll 0.746° $ 0.036, pitch 0.598° $ 0.048°, yaw0.929° $ 0.052°).

Head movements of a rat performing a two-alternativeforced choice task in the visual and auditory modalities. Toillustrate the application of our head tracker in vivo, weimplanted a rat with an electrode array targeting hippocampus(see MATERIALS AND METHODS and Simultaneous head-trackingand neuronal recordings during a two-alternative forcedchoice discrimination task for details). The implant also in-

cluded a base with a magnet that allowed attaching the dotpattern during the behavioral/recording sessions (see Fig. 2,A–B). As previously explained, the animal had to interact withan array of three response ports, each equipped with a feedingneedle and a proximity sensor (Fig. 1). Licking the central porttriggered the presentation of either a visual object (displayedon the screen placed in front of the rat) or a sound (deliveredthrough the speakers located on the side of the monitor). Twodifferent visual objects could be presented to the animal (sameas in Alemi-Neissi et al. 2013; Zoccolan et al. 2009)—object 1required the rat to approach and lick the left response port toobtain liquid reward, while object 2 required him to lick the

Fig. 4. Validation of the head tracker in the reference system of the operant box. A: CAD rendering of the top view of the 3D-printed block (cyan) holding thethree feeding needles that worked as response ports in the operant box. The figure shows how the checkerboard pattern used to calibrate the camera was placedover the response port block, in such a way to be parallel to the floor of the operant box and with its top-right vertex vertically aligned with the central needle.An image of the checkerboard pattern in such a reference position was acquired, so as to express the position measurements returned by the head tracker in areference system with the x and y axes parallel to the edges at the base of the operant box, and the z perpendicular to the floor and passing through the centralfeeding needle. B: CAD rendering of the custom assembly of two 3D-printed goniometers and a rotary stage that was used to obtain the validation measurementsshown in D. C, left: validation measurements (red dots) obtained by placing the dot pattern on a grid of 5 ! 5 ground-truth positions (grid intersections; sameas in Fig. 3D), over the floor of the operant box. Each measurement is relative to the box reference system, defined as described in A. The inset allows appreciatingthe spread of the 30 measurements taken at one of the tested positions. Right: mean precision (top) and accuracy (bottom) of the head-tracker measurements overthe 25 tested positions. RMSE, root mean square error. D, left: validation measurements (red dots) obtained by setting the yaw, pitch, and roll angles of the dotpattern to a combination of 3 ! 7 ! 5 ground-truth values (grid intersections), using the custom assembly shown in B. Each measurement is relative to a posezero reference system, obtained by acquiring an image of the dot pattern with all the angles on custom assembly set to zero. Right: mean precision (top) andaccuracy (bottom) of the head-tracker measurements over the set of tested angle combinations.




right response port (Fig. 5A). The same applied to the twosounds, one associated to the left and the other to rightresponse port.

Figure 5B shows example images captured and processed bythe head tracker in three representative epochs during theexecution of the task (the colored lines are the x=, y=, and z=axes that define the pose of the pattern, as inferred in real timeby the head tracker; see the Supplemental Video: https://doi.org/10.5281/zenodo.2789936). By tracking the position ofthe pattern in the 2D plane of the floor of the operant box, itwas possible to plot the trajectory of the rat’s nose in eachindividual trial of the behavioral test (Fig. 5C, red versus blue

lines, referring to trials in which the animal chose, respectively,the left and right response port). The position of the nose wasmonitored, because it better reflects the interaction of theanimal with the response sensors, as compared with the posi-tion of the pattern. The latter can be converted into the noseposition by carefully measuring the distance between the originof the pattern and the tip of the nose along each of the axes ofpattern reference system at the time of the surgery (suchdistances were: 30.84, 1.5, and 22.16 mm along, respectively,the x=, y=, and z= axes). Importantly, despite the presence ofpossible obstacles that could occlude the view of the pattern(i.e., the cable connected to the headstage for neuronal record-

Fig. 5. Head tracking of a rat engaged in a perceptual discrimination task. A: illustration of the visual discrimination task. The rat learned to lick the centralresponse port to trigger the presentation of either object 1 or object 2 on the stimulus display placed in front of the operant box (see Fig. 1A). Presentation ofobject 1 required the rat to approach and lick the response port on the left to correctly report its identity, while presentation of object 2 required the animal tolick the response port on the right. B: example snapshots captured and processed by the head tracker at three representative times—i.e., when the rat licked thecentral, the left, and the right response ports. The colored lines are the x= (red), y= (green), and z= (blue) axes of the reference system centered on the dot pattern(see Fig. 2C), as inferred in real time by the head tracker (see also the Supplemental Video: https://doi.org/10.5281/zenodo.2789936). C: the trajectories of thenose of the rat in consecutive trials during the execution of the task are superimposed to a snapshot of the block holding the response ports, as imaged by the headtracker. The red and blue traces refer to trials in which the animal chose, respectively, the left and right response port. The trajectories are plotted in the Cartesianplane corresponding to the floor of the operant box, where the x and y axes (black arrows) are, respectively, parallel and orthogonal to the stimulus display (seealso Fig. 1A). D: the bar plot shows the distribution of the percent of frames lost because of segmentation failure across the trials recorded in the experimentof Figs. 6 and 7.




ings), and despite the fact that the head could undergo largerotations (see Fig. 8), only a tiny fraction of frames weredropped due to segmentation failure. As shown in Fig. 5D, inthe majority of the trials (56.3% of the total) less than 10% ofthe frames were lost per trial, and only in 2.1% of the trials weobserved more than 50% of dropped frames.

With such a precise and reliable knowledge of the positionand pose of the nose/head in space and time, we could addressquestions concerning the perceptual, motor, and cognitiveprocesses deployed by the rat during the visual and auditorydiscriminations, well beyond the knowledge that can be gainedby merely monitoring the response times collected by theproximity sensors.

For example, in Fig. 6A, left, we have reported the x positionof the rat’s nose as a function of time in all the behavioral trialscollected over 14 consecutive test sessions (total of 3,413trials). The traces are aligned to the time in which the stimuluswas presented (i.e., 300 ms after the animal had licked thecentral port). From these traces, we computed the reaction time(RcT) of the rat in each trial, defined as the time, relative to thestimulus onset, in which the animal left the central sensor tostart a motor response, eventually bringing its nose to reacheither the left or right response port. As it can be appreciated bylooking at the spread of both sets of traces, RcT was highlyvariable across trials, ranging from 300 ms to 1,166 ms(extreme values not considering outliers), with a medianaround 530 ms (Fig. 6A, right). By contrast, a much smallervariability was observed for the ballistic response time (BRsT),i.e., the time taken by the rat to reach the left or right responseport, relative to the last time he passed by the central port (Fig.6B, left), with BRsT ranging between 66 and 566 ms (Fig. 6B,right: top box plot; median ~300 ms). This suggests that muchof the variability in the total response time (ToRsT; i.e., thetime taken to reach the left or right response port, relative tothe stimulus onset; Fig. 6B, right: bottom box plot) has tobe attributed to the perceptual/decisional process required forthe correct identification of the stimulus. However, a closeinspection of the response trajectories of Fig. 6B (aligned to thetime in which the ballistic motor response was initiated) showsthat the rat, after leaving the central port in response to thestimulus, often did not point directly to the port that he wouldeventually choose (referred to as the selected response port inwhat follows). In many trials, the final ballistic response waspreceded by earlier movements of the head, either toward theselected response port or the opposite one (referred to as theopposite response port in what follows).

To better investigate rat response patterns, we separated therecorded trajectories into three response patterns. In the firstresponse pattern (P1), the rat, after leaving the central port,made a direct ballistic movement to the selected response port(either left or right; Fig. 6C). In the second response pattern(P2), the rat made an initial movement toward the oppositeresponse port, before correcting himself and making a ballisticmovement toward the selected response port (Fig. 6D). In thethird response pattern (P3), the rat made an initial movementtoward the selected response port but then moved back to thecentral port, before approaching again, with a ballistic moment,the selected port (Fig. 6E). Interestingly, RcT was significantlysmaller in P2 than in P1 (P ( 0.001; two-tailed, unpaired t test;Fig. 6F, leftmost bar plot), suggesting that the trials in whichthe animal reversed his initial decision were those in which he

made an “impulsive” choice that he eventually corrected. Asexpected, the motor response time (MoRsT; i.e., the time takenby the animal to reach the selected response port, after leaving,for the first time, the central port) was substantially lower inP1, as compared with P2 and P3, given the indirect trajectoriesthat the latter trial types implied (P ( 0.001; two-tailed,unpaired t test; Fig. 6F, middle bar plot). By contrast, BaRsTwas slightly, but significantly higher in P1 than in P2 and P3,indicating that ballistic movements were faster when theyfollowed a previously aborted choice (P ( 0.001; two-tailed,unpaired t test; Fig. 6F, rightmost bar plot).

To better understand this phenomenon, we plotted the aver-age velocity of the rat’s nose as a function of time for the threeresponse patterns (Fig. 6G; curves are aligned to the onset ofthe ballistic motor response). As expected, in P1, the velocitywas close to zero till the time in which the ballistic responsewas initiated. By contrast, in P2, the velocity was already highat the onset of the ballistic movement, because the animal’snose passed by the central sensor when sweeping from theopposite response port to the selected one (see Fig. 6D). Thismomentum of the rat’s head was thus at the origin of the fasterballistic responses in P2, as compared with P1. In the case ofP3, there was no appreciable difference of velocity with P1 atthe onset of the ballistic movements. This is expected, giventhat the rat moved twice in the direction of the selected portand, in between the two actions, he stopped at the centralsensor (see Fig. 6E). However, following the onset of theballistic movement, the rat reached a larger peak velocity in P3than in P1, which explains the shorter time needed to completethe ballistic response in the former trials’ type. This maypossibly indicate a larger confidence of the rat in his finalchoice, following an earlier, identical response choice that waslater aborted. More in general, the response patterns P2 and P3appear to be consistent with the behavior known as vicarioustrial and error, which reflects the deliberation process de-ployed by rats when considering possible behavioral options(Redish 2016).

We also used the head tracker to inquire whether the ratdeployed different response/motor patterns depending on thesensory modality of the stimuli he had to discriminate. Ratperformance was higher in the sound discrimination task, ascompared with the visual object discrimination task (P (0.001; two-tailed, unpaired t test; Fig. 7A). Consistently withthis observation, the fraction of trials in which the animalaborted an initial perceptual choice (i.e., response patterns P2and P3), as opposed to make a direct response to the selectedport (i.e., response pattern P1), was significantly larger invisual than in auditory trials (P ( 0.01, )2 test for homogene-ity; Fig. 7B). This means that the animal was less certain abouthis initial decision in the visual trials, displaying a tendency tocorrect such decisions more often, as compared with theauditory trials. Interestingly, the lower perceptual discrim-inability of the visual stimuli did not translate into a generaltendency of reaction times to be longer in visual than auditorytrials. When the animal aimed directly to the selected port (P1;the vast majority of trials, as it can be appreciated in Fig. 7B),no significant difference in RcT was observed between visualand auditory trials (P ) 0.05; two-tailed, unpaired t test; Fig.7C, top left bar plot). By contrast, in trials in which the ratcorrected his initial decision (P2), RcT was significantly longerin visual than auditory trials (P ( 0.01; two-tailed, unpaired t




test; Fig. 7C, middle left bar plot), but the opposite trend wasfound in trials in which the animal swung back and forth to theselected response port (P3; P ( 0.05; two-tailed, unpaired ttest; Fig. 7C, bottom left bar plot). We found instead a general

tendency of the rat to make faster ballistic responses in visualthan auditory trials, with this trend being significant in P1 andP2 (P ( 0.001; two-tailed, unpaired t test; Fig. 7D, top rightand middle right bar plots). It is hard to interpret these findings,

Fig. 6. Statistical characterization of the head’s displacements performed by the rat during the perceptual discrimination task. A, left: position of the rat’s noseas a function of time, along the x axis of the Cartesian plane corresponding to the floor of the operant box (i.e., x-axis in Fig. 5C). The traces, recorded in 3,413trials over the course of 14 consecutive sessions, are aligned to the time in which the stimulus was presented (gray dashed line). The black dashed line indicatesthe mean reaction time (as defined in the main text). The red and blue colors indicate trials in which the rat chose, respectively, the left and right response port.The dashed lines highlight two specific examples of left and right responses, while the thick lines are the averages of the left and right response trajectories. Right:box plot showing the distributions of reaction times for the two classes of left and right responses. B, left: same trajectories as in A, but aligned to the last timethe rat passed by the central port (dashed line). Right: box plots showing the distributions of ballistic response times (top) and total response times (bottom) forthe two classes of left and right responses (see RESULTS for definitions). C: subset of the response patterns (referred to as P1) shown in A and B, in which therat, after leaving the central port, made a direct ballistic movement to the selected response port (trace alignment as in B). D: subset of the response patterns(referred to as P2) shown in A and B, in which the rat, before making a ballistic movement toward the selected response port, made an initial movement towardthe opposite port (trace alignment as in B). E: subset of the response patterns (referred to as P3) shown in A and B, in which the rat made an initial movementtoward the selected response port, then moved back to the central port, and finally made a ballistic moment to reach the selected port (trace alignment as in B).F: comparisons among the mean reaction times (left), among the mean motor response times (middle; see main text for a definition), and among the mean ballisticresponse times (right) that were measured for the three types of trajectories (i.e., P1, P2, and P3) shown, respectively, in C, D, and E (***P ( 0.001; two-tailed,unpaired t test). The number of trajectories of a given type is reported on the corresponding bar. Error bars are SE of the mean. G: velocity of the rat’s nose asa function of time for the three types of trajectories P1, P2, and P3.




which could indicate a larger impulsivity of the animal in thevisual discrimination, but, possibly, also a larger confidence inhis decision. Addressing more in depth this issue is obviouslybeyond the scope of our study, since it would require measur-ing the relevant metrics (e.g., RcT and BaRsT) over a cohort ofanimals, while our goal here was simply to provide an in vivodemonstration of the working principle of our head tracker andsuggest possible ways of using it to investigate the perceptual,decision and motor processes involved in a perceptual discrim-ination task.

As a further example of the kind of behavioral informationthat can be extracted using the head tracker, we analyzed thepose of the rat’s head during the execution of the task (Fig. 8).As shown in Fig. 8B, at the time the rat triggered the stimuluspresentation (time 0), his head was, on average across trials(thick curves), parallel to the floor of the operant box (i.e., withpitch 0° and roll close to 0°) and facing frontally the stimulusdisplay (i.e., with yaw close to 0°). However, at the level ofsingle of trials (thin curves), the pose of the head was quitevariable, with approximately a $ 30° excursion in the pitch,and a $ 15°/20° excursion in the roll and yaw. This can be alsoappreciated by looking at the polar plots of Fig. 8C (centralcolumn), which report the average pitch, roll, and yaw anglesthat were measured over the 10 frames (~300 ms) following theactivation of each response port (thin lines: individual trials;thick lines: trials’ average). Being aware of such variability isespecially important in behavioral and neurophysiologicalstudies of rodent visual perception (Zoccolan 2015). In fact, avariable pose of the head at the time of stimulus presentationimplies that the animal viewed the stimuli under quite differentangles across repeated behavioral trials. This, in turns, meansthat the rat had to deal with a level of variation in theappearance of the stimuli on his retina that was larger than that

imposed, by the design, by the experimenter. In behavioralexperiments where the invariance of rat visual perception isunder investigation, this is not an issue, because, as observed inAlemi-Neissi et al. (2013), it can lead at most to an underes-timation (not to an overestimation) of rat invariant shape-processing abilities. However, in studies were a tight controlover the retinal image of the visual stimuli is required, thetrial-by-trial variability reported in Fig. 8, B and C indicatesthat the use of a head tracker is necessary to measure, andpossibly compensate, the change of viewing angle occurringacross repeated stimulus presentations. This applies, for in-stance, to neurophysiological studies of visual representationsin unrestrained (i.e., not head-fixed) rodents, especially whenlocalized stimuli (e.g., visual objects) are used to probe low-and middle-level visual areas. For instance, head tracking,ideally also paired with eye tracking, would be necessary toinvestigate putative ventral stream areas in unrestrained rats, asrecently done in anesthetized (Matteucci et al. 2019; Tafazoliet al. 2017) or awake but head-fixed animals (Kaliukhovichand Op de Beeck 2018; Vinken et al. 2014, 2016, 2017). Bycontrast, the issue of pose variability afflicts less neurophysi-ological studies targeting higher-order association or decisionareas, especially when using large (ideally full-field) periodicvisual patterns (e.g., gratings) (Nikbakht et al. 2018).

Monitoring the pose of the head during the behavioral trialsalso revealed that the rat approached the lateral responses portswith his head at port-specific angles (compare red versus bluecurves in Fig. 8B, and left versus right columns in Fig. 8C).Unsurprisingly, the yaw angle was the opposite for left (~40°)and right (about "35°) responses, since the animal had torotate his head toward opposite directions to reach the lateralresponse ports (see Fig. 8A, red arrows). It was less obvious toobserve opposite rotations also about the roll and pitch axes,

Fig. 7. Statistical comparison of the head displacements per-formed by the rat in visual and auditory trials. A: comparisonbetween the performances attained by the rat in the visual (lightgray) and auditory (dark gray) discrimination tasks (***P (0.001; two-tailed, unpaired t test). B: proportion of responsepatterns of type P1, P2, and P3 (see Fig. 6, C–E) observedacross the visual (light gray) and auditory (dark gray) trials. Thetwo distributions were significantly different (P ( 0.01, )2

test). C: comparisons between the mean reaction times (left)and between the mean ballistic response times (right) that weremeasured in visual (light gray) and auditory (dark gray) trialsfor each type of trajectories (i.e., P1, P2, and P3). *P ( 0.05,**P ( 0.01, ***P ( 0.001; two-tailed, unpaired t test. In everyplot, the number of trajectories of a given type is reported onthe corresponding bar.




which indicate that the rat bent his head in port-specific waysto reach each response port and lick from the correspondingfeeding needle. Again, this information is relevant for behav-ioral studies of rodent visual perception, where the stimulus isoften left on the screen after the animal makes a perceptualchoice and during the time he retrieves the liquid reward(Alemi-Neissi et al. 2013; Djurdjevic et al. 2018; Rosselli et al.2015; Zoccolan et al. 2009). This implies that the animalexperiences each stimulus from a port-specific (and thereforestimulus-specific) viewing angle for a few seconds after thechoice. As pointed out in (Djurdjevic et al. 2018), this canexplain why rats learn to select specific stimulus features toprocess a given visual object.

Simultaneous head-tracking and neuronal recordings dur-ing a two-alternative forced choice discrimination task. Toillustrate how our head tracker can be combined with therecording of neuronal signals, we monitored the head move-ments of the rat during the execution of the visual/auditorydiscrimination task, while recording the activity of hippocam-pal neurons in CA1. Given that, in rodents, hippocampalneurons often code the position of the animal in the environ-ment (Moser et al. 2008, 2015), we first built a map of theplaces visited by the rat while performing the task (Fig. 9A,

top). It should be noted that, differently from typical hippocam-pal studies, where the rodent is allowed to freely move insidean arena, the body of the rat in our experiment was at rest,while his head, after entering through the viewing hole, madesmall sweeps among the three response ports. Therefore, themap of visited positions shown in Fig. 9A refers to the posi-tions of the rat’s nose, rather than to his body. Thus, by binningthe area around the response ports, we obtained a density mapof visited locations by the nose (Fig. 9A, bottom). Not surpris-ingly, this map revealed that the rat spent most of the time withhis nose very close to one of the response ports (red spots).Figure 9B shows instead the average firing rate of an example,well-isolated hippocampal neuron (whose waveforms and in-terspike interval distribution are shown in Fig. 9C) as a func-tion of the position of the rat’s nose (only locations that werevisited more than 10 times were considered). The resultingspatial map displayed a marked tendency of the neuron topreferentially fire when the rat approached the right responseport. In other words, this neuron showed the sharp spatialtuning that is typical of hippocampal place cells, but over amuch smaller spatial extent (i.e., over the span of his headmovements around the response ports) than typically observedin freely moving rodents.

Fig. 8. Statistical characterization of the head’s rotations performed by the rat during the perceptual discrimination task. A: illustration of the yaw and roll rotationsthat the rat’s head can perform, relative to its reference pose (see main text). B: time course of the pitch (left), roll (middle), and yaw (right) angles during repeatedtrials of the perceptual discrimination task. The red and blue colors indicate trials in which the rat chose, respectively, the left and right response port. Tracesare aligned to the time in which the stimulus was presented (dashed gray line). The mean total response time (i.e., the mean time of arrival to the selected responseport) is shown by the dashed black line. The thick colored lines are the averages of the left and right response trajectories. C: the pitch (top), roll (middle), andyaw (bottom) angles measured by the head tracker in individual trials (thin blue lines) and, on average, across all trials (thick black lines), at the times the ratreached the left, central, and right response ports.




To verify that the neuron mainly coded spatial information,we plotted its activity as a function of time, across the variousepochs of the discrimination task, as well as a function of thesensory stimuli the rat had to discriminate and of the correct-ness of his choices. This is illustrated by the raster plots of Fig.9D, top, where each dot shows the time at which the neuronfired an action potential, before and after the animal made aperceptual choice (i.e., before and after he licked one of thelateral response ports). Individual rows show the firing patternsin repeated trials during the task, with the color coding theidentity of the stimulus. In addition, the trials are grouped

according to whether the rat’s response was correct (left) or not(right). Each raster plot was then used to obtain the perire-sponse time histograms (PRTHs) shown in the bottom of Fig.9D, where the average firing rate of the neuron across trials ofthe same kind is reported as a function of time. As expected fora place cell, and consistently with the spatial tuning shown inFig. 9B, the neuron started to respond vigorously after the ratlicked the right response port and did so regardless of whetherthe stimulus was auditory or visual (see, respectively, the greenand blue dots/lines in Fig. 9D). In addition, the neuron alsofired when the rat licked the right port in response to the visual

Fig. 9. Place field of a rat hippocampal neuron during the execution of the perceptual discrimination task. A, top: map of the places visited by the nose of therat (red dots) around the area of the response ports, while the animal was performing the discrimination task. Bottom: density map of visited locations aroundthe response ports. The map was obtained by 1) dividing the image plane in spatial bins of 20!25 pixels; 2) counting how many times the rat visited each bin;and 3) taking the logarithm of the resulting number of visits per bin (so as to allow a better visualization of the density map). B: place field of a well-isolatedhippocampal neuron (as shown in C). The average firing rate of the neuron (i.e., the number of spikes fired in a 33-ms time bin, corresponding to the durationof a frame captured by the head tracker) was computed across all the visits the rat’s nose made to any given spatial bin around the response ports (same binningas in A). Only bins with at least 10 visits were considered, and the raw place field was smoothed with a Gaussian kernel with sigma % 1.2 bins. C: superimposedwaveforms (left) and interspike interval distribution (right) of the recorded hippocampal neuron. D: time course of the activity of the neuron, during the executionof the task. The dots in the raster plots (top) indicate the times at which the neuron fired an action potential (or spike). The periresponse time histograms (PRTHs)shown in the bottom graphs were obtained from the raster plots by computing the average number of spikes fired by the neuron across repeated trials inconsecutive time bins of 0.48 ms. The color indicates the specific stimulus condition that was presented in a given trial (see key in the figure). The panels onthe left refer to trials in which the rat’s choice was correct, while the panels on the right refer to trials in which his response was incorrect. Both the raster plotsand the PRTHs are aligned to the time of arrival to the response port (time 0; black dashed line). The gray dashed line shows the mean stimulus presentationtime (gray dashed line), relative to the time of the response.




stimulus (object 1) that required a response to be delivered onthe left port, i.e., on trials in which his choice was incorrect(red dots/line).

Obviously, our analysis of this example neuronal responsepattern is far from being systematic. It is simply meant toprovide an illustration of how our lightweight, portable head-tracking system can be applied to study the motor/behavioralpatterns of small rodents (and their neuronal underpinnings) inexperimental contexts where the animals do not navigate largeenvironments but are confined to a restricted (often small) spanof spatial locations.

DISCUSSION

In this study, we designed, implemented, and validated anovel head-tracking system to infer the three Cartesian coor-dinates and the three Euler angles that define the pose of thehead of a small mammal in 3D. Algorithmic-wise, our ap-proach draws from established computer vision methods butcombines them in a unique and original way to achieve thelarge accuracy, precision, and speed we demonstrated in ourvalidation measurements (Figs. 3 and 4). This is typical ofcomputer vision applications, where the challenge is often toproperly choose, among the battery of available approaches,those that can be most effectively combined to solve a givenproblem, so as to craft a reliable and efficient application. Inour case, we built our head-tracking software incrementally, byadding specific algorithms to solve a given task (e.g., featureextraction), until the task was accomplished with sufficientreliability (e.g., minimal risk of segmentation failure) and thewhole tracking system reached submillimeter and subdegreeprecision and accuracy. As a result, our method, when com-pared with existing approaches, has several distinctive featuresthat make it ideal for many applications in the field of behav-ioral neurophysiology.

First, to our knowledge, this is the only methodologicalstudy that describes in full detail a videography-based methodfor 3D pose estimation of the head, while providing at the sametime a systematic validation of the system, a demonstration ofits application in different experimental contexts, and thesource code of the tracking algorithm, along with an examplevideo to test it (see MATERIALS AND METHODS). A few otherdescriptions of head trackers for 3D pose estimation can befound in the neuroscience literature, but only as supplementarysections of more general studies (Finkelstein et al. 2015;Wallace et al. 2013). As a consequence, the details of theimage processing and tracking algorithms are not covered asexhaustively as in our study (see MATERIALS AND METHODS andAPPENDIX A), and the sets of validation measurements to estab-lish the precision and accuracy of the methods are not alwaysreported. This makes it hard for a potential user to implementthe method from scratch and have an intuition of its validity ina specific application domain, also because neither open-sourcecode nor freeware software packages implementing the track-ing algorithms are available.

A second, key feature of our method is the minimal weightand encumbrance of the dot pattern implanted on the animal’shead, along with the fact that the implant, being passivelyimaged by the camera of the head tracker, does not containactive elements that need to be electrically powered (Figs. 1and 2). Achieving such a purely passive tracking was a major

challenge in the development of our method, which we solvedby applying a cascade of feature extraction algorithms thatprogressively refine the detection of the dots on the pattern (seeAPPENDIX A). This sets our head tracker apart from existingapproaches, based either on videography (Finkelstein et al.2015; Wallace et al. 2013) or on inertial measurement units(Kurnikova et al. 2017; Pasquet et al. 2016), which rely insteadon relatively large headsets that need to be powered to operate.As a result, these methods do not allow wireless recordings andtheir application is limited to relatively large (usually open-field) arenas. By contrast, as demonstrated in our study (Figs.6, 7, 8, and 9), our system can work in the small enclosures andoperant boxes that are typical of perceptual studies, even whenthe animal is instructed to sample the sensory stimuli througha narrow viewing hole. Crucially, these practical advantagesover other methods were achieved without compromising ontracking accuracy, which was comparable or superior to thatreported for previous camera-based head trackers. Specifically,the overall mean absolute error achieved in Finkelstein et al.(2015) was 5.6° (i.e., an order of magnitude lager than thevalues obtained in our study), while Wallace et al. (2013)reported RMSEs in the order of those found for our system(i.e., 0.79° for pitch, 0.89° for roll, and 0.55° for yaw; comparewith Fig. 4D).

Together, the lightness and compactness of our method,along with its suitability for wireless recordings, makes oursystem ideal for high-throughput applications, where tens ofanimals are either continuously monitored in their home cageswhile engaged in spontaneous behaviors or are tested in par-allel, across multiple operant boxes, in perceptual discrimina-tion tasks (Aoki et al. 2017; Dhawale et al. 2017; Meier et al.2011; Zoccolan 2015). At the same time, no major obstaclesprevent our method to be used in more traditional contexts, i.e.,in navigation studies employing open-field arenas, mazes, orlinear tracks (Moser et al. 2008, 2015). This would simplyrequire choosing a video camera with a larger field of view anda higher resolution, so as to image with sufficient definition thedot pattern over a wider area. In this regard, it is important tohighlight that our method, despite being based on purelypassive imaging, can successfully track head movements asfast as ~400 mm/s (Fig. 6G)—a speed similar to that reachedby rats running on a linear track, as measured by imaginghead-mounted LEDs (Davidson et al. 2009). Our method couldalso be extended to track in 3D the head’s pose of multipleanimals in the same environment—e.g., by adding to eachpattern an additional dot with a distinctive color, coding for theidentity of each animal. Alternatively, multiple synchronizedcameras could be used to track the head of a single animal,either to extend the field over which the head is monitored(e.g., in case of very large arenas) or to mitigate the problem ofocclusion (e.g., by the cables of the neuronal recording appa-ratus or by the presence of objects or other animals in thetesting area), thus reducing the proportion of dropped framesbecause of segmentation failure (although this issue, at least inthe context of the perceptual task tested in our study, is alreadyproperly addressed by the use of a single camera; see Fig. 5D).

With regard to our validation procedure, it should be notedthat the collection of 30 repeated head-tracker measurements inrapid succession at each tested position/pose could in principlelead to an overestimation of the precision of our system. Infact, such validation would not take into account the environ-




mental changes that are likely to occur over the course of manyhours or days of usage of the head tracker (e.g., because ofmechanical drifts or changes of temperature of the camera).Given our goal of testing the head tracker over grids with tensof different positions/poses (see Figs. 3 and 4), it was obvi-ously impossible to obtain many repetitions of the same mea-surements across delays of hours or days. However, we mini-mized the risk of overestimating the precision of our system,by making sure that the camera was firmly bolt into the optictable where our measurements took place and by keeping thecamera turned on for many consecutive days before performingour tests (with the camera and the whole testing rig locatedinside an air-conditioned room with tight control of the envi-ronmental temperature). This allowed the camera to operate ata stable working temperature and minimized mechanical drifts.

To conclude, because of the key features discussed above,we believe that our method will allow exploring the influenceof head positional signals on neuronal representations in avariety of cognitive tasks, including not only navigation, butalso sensory processing and perceptual decision making. Infact, there is a growing appreciation that, at least in the rodentbrain, the representation of positional information is ubiqui-tous. Cells tuned for head direction have been found in unex-pected places, such as the hippocampus (Acharya et al. 2016).An ever-increasing number of spatial signals (e.g., speed)appear to be encoded in enthorinal cortex (Kropff et al. 2015).A variety of positional, locomotor, and navigational signals hasbeen found to be encoded even in sensory areas, such asprimary visual cortex (Ayaz et al. 2013; Dadarlat and Stryker2017; Niell and Stryker 2010; Saleem et al. 2013, 2018). Sofar, this positional information has been mainly studied in 2D,but the rodent brain likely encodes a much richer, 3D repre-sentation of the head, and studies exploring this possibility areunder way. For instance, some authors recently reported theexistence of neurons in rat V1 that linearly encode the animal’s3D head direction (i.e., the yaw, roll, and pitch angles) (Gui-tchounts et al. 2019). It is likely that such 3D head positionalsignals affect also higher-order visual association areas, asthose recently implied in shape processing and object recog-nition (Kaliukhovich and Op de Beeck 2018; Matteucci et al.2019; Tafazoli et al. 2017; Vinken et al. 2014, 2016, 2017), aswell as the cortical regions involved in perceptual decisionmaking, such as posterior parietal cortex (Licata et al. 2017;Nikbakht et al. 2018; Odoemene et al. 2018; Raposo et al.2014). Overall, this indicates that estimation of the pose of thehead in 3D will soon become an essential tool in the study ofrodent cognition. This calls for the availability of reliable, fullyvalidated, easy-to-use and freeware/open-source approachesfor 3D head tracking, as the method presented in this study.

One could wonder whether this problem could be solved, inan even less involving and more user-friendly way, by ap-proaches that perform markerless tracking of animals or bodyparts using deep learning architectures (Insafutdinov et al.2016; Mathis et al. 2018). Most notably, DeepLabCut has beenrecently shown to be able to segment and track the body andbody parts of several animal species with great accuracy(Mathis et al. 2018). However, current tracking methods basedon deep learning operate in the pixel space. They exploit thegeneralization power of deep networks trained for image clas-sification (often referred to as transfer learning) to correctlysegment, label, and track target features (e.g., a hand or a paw)

in the image plane. However, they are not meant to provide ageometric reconstruction of the pose of a feature in 3D, whichis especially problematic if the goal is to measure rotationsabout the three Euler rotation axes. While it can be relativelyeasy to convert pixel displacements into metric displacementsby acquiring proper calibration measurements, it is not obvioushow pixel measures could be converted into angle estimates. Inthe spirit of deep learning, this would likely require feeding toa deep network many labeled examples of head poses withknown rotation angles, but, without resorting to marker-basedmeasurements, it is unclear how such ground-truth data couldbe collected in the first place. In summary, at least in thecontext of experiments that require estimating head rotations in3D, videography methods that are based on tracking patterns ofmarkers with known geometry are not easily replaceable bymarkerless approaches.

APPENDIX A

Point detection module. To solve the final PnP problem andestimate the pose of the dot pattern in the camera reference system, allthe possible point candidates that represent a 2D projection configu-ration of the pattern must be considered. This requires extracting firstthe positions of all six points in each frame captured by the camera.The PD module of our algorithm takes care of this step by applying aDifference of Gaussians (DoG) filter, which is particularly efficient tocompute, is rotationally invariant, and shows a good stability underprojective transformation and illumination changes (Lowe 1991,2004).

The DoG filter is an approximation of the well-known Laplacian ofGaussian (LoG) filter. It is defined as the difference between theimages resulting from filtering a given input image I with twoGaussians having different sigma, *1 and *2 (Jähne 2005), i.e.:DoG % G(I,*1)"G(I,*2), where G(I,*) is the convolution of I with aGaussian filter G with parameter *. When the size of the DoG kernelmatches the size of a bloblike structure in the image, the response ofthe filter becomes maximal. The DoG kernel can therefore be inter-preted as a matching filter (Duda et al. 2001). In our implementation,the ratio R % *1/*2 has been fixed to 2. Therefore, *1 and *2 can bewritten as

*1 # *)2

*2 # * ⁄ )2

where * can be interpreted as the size of the DoG kernel. In principle,* should be set around r⁄)2 where r is the radius of a black dot asit appears on a frame imaged by the camera (Lindeberg 1998).However, the method is quite tolerant to the variations of the distanceof the dot’s pattern from the camera, and the dots are correctlydetected even when they are at a working distance that is half than thatoriginally set (i.e., when their size is twice as large as r). In addition,the software implementation of our head tracker includes a graphicaluser interface (GUI) that allows manually adjusting the value of *, aswell as of other key parameters (e.g., Ic and score; see next paragraphand Point-correspondences identification module and Perspective-n-Point module below), depending on the stability of the trackingprocedure (as visually assessed by the user, in real time, through theGUI).

After detecting the candidate dots in the image plane using the DoGfilter, we applied a nonmaxima suppression algorithm (Canny 1987)that rejects all candidate locations that are not local maxima and aresmaller than a contrast threshold Ic. Still, depending on the value of Ic,a large number of false positives can be found along the edges of thepattern. In fact, a common drawback of the DoG and LoG represen-tations is that local maxima can also be detected in the neighborhood




of contours or straight edges, where the signal change is only in onedirection. These maxima, however, are less stable, because theirlocalization is more sensitive to noise or small changes in neighboringtexture. A way to solve the problem of these false detections would beto analyze simultaneously the trace and the determinant of the Hessianmatrix over a neighborhood of pixels in the image (Mikolajczyk andSchmid 2002). The trace of the Hessian matrix is equal to the LoG butconsidering simultaneously the maxima of the determinant penalizespoints for which the second derivatives detect signal-changes in onlyone direction. Since a similar idea is explored in the Harris cornernessoperator (Harris and Stephens 1988), our algorithm exploits thealready calculated Gaussian smoothed image and uses the efficientimplementation of Harris corner detection available in the OpenCVlibrary (https://opencv.org).

Given an image I, Harris cornerness operator obtains the localimage structure tensor HSp

over a neighborhood of pixels Sp, whereHSp

is defined as

HSp# # *

Sp

!+ I ⁄ +x"2 *Sp

!+ I ⁄ +x + I ⁄ +y"

*Sp

!+ I ⁄ +x + I ⁄ +y" *Sp

!+ I ⁄ +y"2 $ .

Here, +I⁄+x and +I⁄+y are the partial derivatives of the imageintensity I along the two spatial axes x and y of the image plane,computed using a Sobel operator with an aperture of three pixels—i.e., a 3!3 filter that implements a smooth, discrete approximation ofa first order derivative (González and Woods 2008; Jähne 2005). HSpprovides a robust distinction between edges and small blobs becausethe difference det!HSp" ' k*trace!HSp"2 assumes values that arestrictly negative on edges and positives on blob centers. This differ-ence depends on three parameters: 1) the aperture of the Sobel filter,which, as mentioned above, was fixed to the minimal possible value(3) for the sake of speed of computation; 2) the weight k, assigned tothe trace of the HSp

tensor, which, as suggested in Grauman and Leibe(2011), was set to 0.04, i.e., the default value of the openCV library(https://opencv.org); and 3) the block size of the squared neighbor-hood Sp that was empirically set to 9 based on some pilot tests, whereit showed good stability over a broad range of working distances (notethat Sp could be extrapolated from the dot size, the working distancebetween the camera and the dot pattern, and the internal cameraparameters, assuming an ideal condition of a clean white backgroundaround the dots).

To summarize, the complete PD algorithm worked as follows.First, the DoG filter was applied to identify all candidate dots in theacquired image. Then, the nonmaxima suppression algorithm wasused to prune some of the false positives around the maxima. Finally,for each of the remaining detected dots, the Harris cornerness operatorHSp

was computed over the a neighborhood Sp centered on the positionof each dot and, depending on the sign of the difference det!HSp" ' k*trace!HSp"2, the dot was rejected as a false positive oraccepted as the projection on the image plane of one of the dots of the3D pattern. As mentioned above, the contrast threshold Ic of thenonmaxima suppression algorithm was adjustable by the user throughthe same GUI used to adjust * of the DoG.

To conclude, it should be noted that, despite the pruning of falsedetections performed by the PD algorithm, still many spurious dotsare identified in the image, in addition to those actually present on thepattern. This is because, in general, many dotlike features are presentin any image. To further refine the identification of the actual dots ofthe pattern, it is necessary to take into account their relationship, giventhe geometry of the pattern itself. This is achieved by the PcIalgorithm that is described in Point-correspondences identificationmodule.

Point-correspondences identification module. Since projectivetransformations maintain straight lines, aligned triplets of dots (p1, p2,

p3) in the 3D pattern must still be aligned in the images captured bythe camera. Our PcI algorithm searches all aligned triplets of dots inan image and, to reduce the number of possible triplets, only thosehaving length (i.e., distance between the external points) smaller thanD are considered. To identify a triplet, for any given pair of detecteddots, we looked at whether a third, central dot was present in theproximity of the middle position between the pair (note that, in caseof a triplet, the offset of the projected central point with respect to themiddle position is negligible, because usually the black dots are at amuch shorter distance than the working distance and the orthographicprojection can be adopted). D is automatically computed from thephysical distances of the external points of the triplets on the dotpattern, knowing the intrinsic parameters of the camera. It must be setto be slightly bigger than the maximum distance between the externalpoints in the triplets on the image plane, when the pattern is positionedat the maximal working distance from the camera (10% biggershowed to be sufficient). This condition is achieved when the z-axis ofthe pattern is exactly aligned to the optical axis of the camera.

Once all candidate triplets have been identified, the algorithm looksfor those having a common external point, which corresponds to thecorner of the L-shaped arrangement of five coplanar dots in the pattern(Fig. 2A). The search of this L’s corner is performed by considering5-tuples of points configurations to obtain the correspondence finalassignment. In addition to collinearity, another important projectiveinvariant is the angular ordering (on planes facing the view directionof the imaging device). That is, if we take three points defining atriangle, once we have established an ordering to them (either clock-wise or counterclockwise), such ordering is maintained under anyprojective transformations that look down to the same side of theplane (Bergamasco et al. 2011). In our framework, this impliesevaluating the external product of the two vectors that start from thecommon point (i.e., the L’s corner) and end on the respective externalpoints. This establishes the orientation order and, consequently,uniquely assigns the five black dots correspondences between the 3Dpattern and its image.

Following this assignment, the sixth dot belonging to the pattern(i.e., the one placed over the pillar; Fig. 2A) is searched in proximityof the L’s corner. To this aim, the maximal distance between the doton the pillar and the L’s corner (DP) in the image plane is automat-ically estimated (in the same way as D) from the actual distancebetween the two dots in the 3D pattern, knowing the camera’s internalparameters and its maximal working distance from the pattern. Thisyields a set of candidate pillar dots. Finding the correct one requiresevaluating each candidate dot in conjunction with the other fivecoplanar dots in terms of its ability to minimize the reprojection errorcomputed by solving the PnP problem (see Perspective-n-Point mod-ule below). It should be noted that the correct identification of thesixth point on the pillar is fundamental, since it is the only point outof the plane that allows the PnP problem to be solved robustly. Thereprojection error is defined as the sum norm distance between theestimated positions of the dots in the image and the projections ofthe physical dots of the 3D pattern on the image plane, under a givenassumed pose of the pattern and knowing the camera’s calibrationparameters. More specifically, during the sixth point selection wedefined the score of a given dots configuration as

score # 100 ' S * Reprojection _ error (A1)

where S is a proper scalar factor established experimentally (in ourapplication, we set S % 5). To understand the meaning of S, let’ssuppose, for instance, to have a distance error of one pixel for eachdot, thus yielding a reprojection error of 6. Without the S scalingfactor (i.e., with S % 1), we would obtain a score of 94. However, theerror on each dot is typically well below one pixel (see next paragraphabout the way to estimate the dots coordinates with subpixel accuracy)and the score would therefore be always close to 100. Hence, the needof introducing the factor S to rescale the score, so that it can rangebetween 90 and 100. As mentioned above, during the selection




procedure, the sixth point on the pillar was chosen among thecandidates that maximized the score defined above. In fact, eachhypothesis about the position of the sixth point yielded a PnP trans-formation, for which it was possible to compute the reprojection error.Note that, to eliminate some possible ambiguities in the selection ofthe sixth point, we also exploited the a priori knowledge about thedirection of the pillar (which must point toward the camera sensor).

As expected, given how sensitive the PnP procedure is to smallvariations in the estimated positions of the dots, we empiricallyverified that a pixel-level accuracy was not sufficient to guaranteehigh precision and stability in our pose estimates. For this reason, weestimated the positions of the centers of the dots at the subpixel levelby three-point Gaussian approximation (Naidu and Fisher 1991). Thismethod considers the three highest, contiguous intensity values (alongeither the x or y spatial axes) within a region of the image that has beenidentified as one of the dots (i.e., a blob) and assumes that the shapeof the observed peak fits a Gaussian profile. This assumption isreasonable, because the sensor integration over a small area ofpixels containing a blob, after the DoG filtering, produces smoothprofiles very similar to Gaussian profiles. If a, b, and c are theintensity values observed at pixel positions x " 1, x, and x ' 1,with b having the highest value, then the subpixel location (xs) ofthe peak is given by

xs # x '1

2

ln!c" ' ln!a"ln!a" , ln!c" ' 2ln!b"

where x is the x-coordinate of the center of the pixel with intensityvalue b. The same approximation is applied to obtain the subpixel ycoordinate ys of the dot center.

Perspective-n-Point module. In computer vision, the problem ofestimating the position and orientation of an object with respect to aperspective camera, given its intrinsic parameters obtained fromcalibration and a set of world-to-image correspondences, is known asthe Perspective-n-Point camera pose problem (PnP) (Fiore 2001;Lowe 1991). Given a number of 2D-3D point correspondences mi!Mi (where mi % [u v]= are the 2D coordinates of point i over the imageplane, and Mi % [X Y Z]= are the 3D coordinates of point i in thephysical environment) and the matrix K with the intrinsic cameraparameters (see definition below), the PnP problem requires to find 1)a rotation matrix R that defines the orientation of the object (i.e., of thepattern reference system x=, y=, z= in our case) with respect to thecamera reference system x, y, z (see Eq. 1 in the RESULTS and Fig. 2C);and 2) a translation vector t that specifies the Cartesian coordinates ofthe center of the object (i.e., of the origin O= of the pattern referencesystem in our case) in the camera reference system, such that

mi + K'R t(Mi (A2)for all i,

where ~ denotes homogeneous coordinates (Jähne 2005) and + definesan equation up to a scale factor. Specifically:

mi # #u

v

1$

K # #fx 0 cx

0 fy cy

0 0 1$

'R t ( # #r11 r12 r13 t1

r21 r22 r23 t2

r31 r32 r33 t3$

Mi ##X

Y

Z

1$ ,

where fx and fy are the focal lengths and cx and cy are the coordinatesof the principal point of the camera lens.

To solve the PnP problem, all methods have to face a tradeoffbetween speed and accuracy. Direct methods, such as Direct LinearTransform, find a solution to a system of linear equations as in Eq. A2and are usually faster but less accurate, as compared with iterativemethods. On the other hand, iterative methods that explicitly mini-mize a meaningful geometric error, such as the reprojection error, aremore accurate but slower (Garro et al. 2012). In our application, weadopted the method known as EPnP (Lepetit et al. 2009), followed byan iterative refinement. The EPnP approach is based on a noniterativesolution to the PnP problem and its computational complexity growslinearly with n, where n is the number of point correspondences. Themethod is applicable for all n ! 4 and properly handles both planarand nonplanar configurations. The central idea is to express the n 3Dpoints as a weighted sum of four virtual control points. Since in oursetup high precision is required, the output of the closed-form solutiongiven by the EPnP was used to initialize an iterative Levenberg-Marquardt scheme (More 1977), which finds the pose that minimizesthe reprojection error, thus improving the accuracy with negligibleamount of additional time. Both the EPnP and the Levemberg-Marquardt iterative scheme are available in the openCV library(https://opencv.org) and are extremely efficient. The execution time inour HP Workstation Z620 was in the order of some milliseconds.

Camera calibration procedure. Camera calibration, or moreprecisely camera resectioning, is the process of estimating the param-eters of a pinhole camera model approximating the true camera thatproduced a given image or a set of images. With the exception of theso-called self-calibration methods, which try to estimate the parameterby exploiting only point correspondences among images, a calibrationobject having a known precise geometry is needed. In fact, self-calibration cannot usually achieve an accuracy comparable with thatobtained with a calibration object, because it needs to estimate a largenumber of parameters, resulting in a much harder mathematicalproblem (Zhang 2000).

Much progress has been done, starting in the photogrammetrycommunity, and more recently in the field of computer vision, interms of developing object-based calibration methods. In general,these approaches can be classified in two major categories, based onthe number of dimensions of the calibration objects: 1) 3D object-based calibration, where camera calibration is performed by observinga calibration object whose geometry in 3D space is known with verygood precision; and 2) 2D plane-based calibration, which is based onimaging a planar pattern shown at a few different orientations. In ourapplication, we adopted this second option, because it has proven tobe the best choice in most situations, given its ease of use and goodaccuracy. Specifically, we adopted the method of Zhang (2000),available in the OpenCV library (https://opencv.org), which, in itsiterative process, also estimates some lens distortion coefficients (seefollowing paragraphs for a discussion on the distortion).

The fundamental equation to achieve the calibration is the same ofthe PnP problem (i.e., Eq. A2), and the iterative solution is based onminimizing the reprojection error defined in Eq. A1. However, in thecase of the camera calibration, also the matrix K with the parametersis unknown, in addition to R and t. As such, solving the equation is,in principle, harder. However, the algorithm used to minimize thereprojection error does not need to run in real time, since thecalibration is performed before the camera is used for head tracking.In addition, the point correspondences mi!Mi over which the error iscomputed and minimized are in the order of several hundreds, whichmakes the estimation of the parameters very robust and reliable. In our




application, these points were the intersections of the 9!7 squares ofa planar checkerboard (shown in Fig. 4A) imaged in 15 differentposes/positions. Specifically, a few snapshots were taken centrally atdifferent distances from the camera, others spanned the image plane tosample the space where the radial distortion is more prominent, andfinally (and more importantly) other snapshots were taken fromdifferent angles of orientation with respect to the image plane (Zhang2000).

To measure the effect of changing the calibration images over thepose estimation, we collected a set of 80 images of the calibrationcheckerboard at different orientations. 50 random subsamples (with-out replacement), each composed of 50% of the total images, wereused to calibrate the system, thus yielding 50 different calibrations. InTable A1, we report the mean and standard deviation of the internalparameters and the lens distortion coefficients obtained from suchcalibrations. In Table A2, we report the averages and standard devi-ations of the radial and tangential part contributions of the distortion.Some of the parameters reported in these tables (i.e., fx, fy, cx, and cy)have been defined in Eq. A2, while other parameters (k1, k2, k3, p1, andp2) are distortion coefficients described in the next paragraphs.

The focal lengths fx and fy (Table A1) showed a good stability. Thecoordinates of the principal point (cx, cy), as it is well known, aresome of the hardest parameters to estimate and, in our case, had astandard deviation of ~4.5 pixels. In any event, they are additive termsand, as such, do not affect distance and angle measurements. Theremaining parameters describe the distortion introduced by the lensduring the image formation. We quantified the effect of distortion,both the radial and the tangential part, on the pose estimation byconsidering our specific setup.

From Eq. A2, after the roto-translation transformation [x y z] % [R t][X Y X 1], we have the 2d homogeneous coordinates

x' # x ⁄ z, y' # y ⁄ z .

Then, the distorted coordinates x# and y# can be modeled by theparameters k1, k2, k3, p1, and p2 using the following equation:

x! # x'!1 , k1r2 , k2r4 , k3r6" , 2p1x'y' , p2!r2 , 2x'2"y! # y'!1 , k1r2 , k2r4 , k3r6" , 2p2x'y' , p1!r2 , 2y'2" (A3)

where r2 % x=2'y=2. The final 2D image coordinates are then obtainedby s'u v 1 ( # K'x'' y'' 1 (.

In the worst case scenario, namely in the corners of the imagewhere the distortion is at its maximum, x= and y= can be approximatedby dSx/2f and dSy/2f, where f is the focal length of the lens and dSxand dSy are the x and y dimensions of the CCD sensor. By simplegeometrical considerations, since our camera mounts a 1/3# CMOS

sensor and the lens has a focal length of 12.5 mm, r2 results equal to0.0576. The distortion corrections were then calculated for everysample considering this worst case. The radial distortion correction!k1r

2 , k2r4 , k3r

6" dominates the tangential part '2p1x'y' , p2!r2 , 2x'2"(, but, more importantly, it has a standard deviation of just~0.1% (see Table A2).

In this framework, it is clear that the calibration is very stable, withparameters fluctuating around 0,1% for fx, fy and the distortioncoefficients, whereas cx and cy do not affect the displacement orangular measurements. Vice versa, an accurate measurement of thephysical distances between the dots of the pattern is crucial. Weverified that an error of 5% in the dot distances produces errors of 5%in the pose estimates (for example, if the distances are estimated 5%bigger than the truth, the dot pattern is estimated at a pose 5% moredistant to the camera) and, consequently, the translation measure-ments are affected.

APPENDIX B

Accuracy measurements of the custom assemblies used toposition the dot pattern. The accuracy of the placement of the dots’pattern over the 5!5 breadboard shown in Fig. 3A was computedfrom the known accuracy of the 3D printer used to build the board.Specifically, our 3D printer can make at most an error of $ 0.05 mm($0.025 mm) every 25.4 mm of printed material along the x and ydirections (the error decreases at 0.025 mm for z axis). Given the150-mm span along the x-axis and the 100-mm span along the y-axis,this yields, in the worst case scenario in which the errors keepsaccumulating without canceling each other, a maximal expected errorof $ 0.15 mm along x and $ 0.10 mm along y.

Computing the precision and accuracy of the custom assembly(made of two 3D-printed goniometers and a rotary stage) that wasused to rotate the dot pattern in 3D (Fig. 4B) was more involving. Itrequired building a validation setup, such that actual and desiredangular displacements could be first transformed into linear shifts foreasier comparison. To validate the yaw and pitch rotations, the setupconsisted of two facing plates, mounted vertically on an opticalbreadboard table (Kinetic Systems Vibraplane) and held perpendicu-larly to the table with two rectified vertical brackets (ThorlabsVB01B/M; see Supplemental Fig. S1A: https://doi.org/10.5281/ze-nodo.3429691). On the surface of one of the plates we placed amillimeter-graduated paper, while on the other plate we mounted thecustom assembly, but with a laser source instead of the dots’ pattern.The rationale of the system is that, once the center of rotation of eachgoniometer is known, each performed (nominal) rotation results in aspecific linear displacement of the laser beam over the millimeter-graduated paper. Such linear displacement can be converted back tothe actual angle - of the rotation using the following trigonometricidentity:

- # tan'1l

d

where l is the linear displacement and d is the distance between therotation center of the goniometer used to preform the rotation and themillimeter-graduated paper, along the axis that is perpendicular to the

Table A1. Mean values and standard deviations of the internal camera parameters and distortion coefficients

fx fy Cx Cy k1 k2 p1 p2 k3 RMSE, pixels

Mean 2,701.7 2,707.4 636.6 508.9 "0.396 2.23 0.00098 "0.0019 "26.37 0.10SD 2.51 1.98 4.73 4.29 0.012 0.78 0.00018 0.00025 15.50 0.00094

The reprojection error, namely the distance between the square’s corners in the images and their points reprojected after the calibration, is reported in the lastcolumn. fx and fy are the focal lengths and cx and cy are the coordinates of the principal point of the camera lens (see Eqs. A2 and A3); k1, k2, k3, p1, and p2 arethe other distortion coefficients used in Eq. A3. RMSE, root mean square error.

Table A2. Mean values and standard deviations of the twocomponents of the distortion, i.e., the radial and the tangential part

Radial Distortion Tangential Distortion

Mean "0.02048 "0.00020SD 0.0010 2.49e-05

The radial distortion of "0.02048 indicates that the pixels in the corners ofthe image appear ~2% closer to the center of the image (barrel distortion).




latter. Thus, by performing several repeated nominal rotations, target-ing various combinations of desired yaw and pitch angles, it waspossible to establish both the precision and the accuracy of theperformed rotations. Specifically, we set the yaw and pitch angles atthe following pairs of values: (1°,1°), (3°,3°), (5°,5°), (8°,8°), and(10°,10°) in the top right quadrant; (1°,-1°), (3°,-3°), ..., etc. in thebottom right quadrant; ("1°,-1°), ("3°,-3°), ..., etc. in the bottom leftquadrant; and ("1°,1°), ("3°,3°), ..., etc. in the top left quadrant. Eachtarget rotation was repeated three times, thus yielding a total of 75validation measurements, from which we computed the accuracy andprecision of the performed rotations (in terms of RMSE) as 0.07° and0.02° for the yaw and 0.05° and 0.03° for the pitch.

For the roll angle, we followed a similar approach, but the platewith the millimeter-graduated paper and the plate mounting thecustom assembly were placed perpendicularly to each other, ratherthen parallel (see Supplemental Fig. S1B: https://doi.org/10.5281/zenodo.3429691). This allowed rotating the rotary stage used toproduce the roll displacements in such a way that the rotationsresulted as vertical linear shifts over the millimeter-graduated paper.Specifically, we set the roll angle to the following set of values: 1°, 3°,5°, 8°, and 10°, as well as "1°, "3°, "5°, "8°, and "10°, and werepeated each rotation three times. This yielded an estimate of theaccuracy and precision of the roll displacements as 0.13° and 0.04°RMSE.

ACKNOWLEDGMENTS

Present address for D. Bertolini: VIVISOL (SOL group), Milan, Italy.Present address for A. Perissinotto: Department of Biomedical Engineering,

Tel Aviv University, Ramat Aviv, Israel.

GRANTS

This work was supported by a Human Frontier Science Program Grant to D.Zoccolan (contract n. RGP0015/2013), a European Research Council Consol-idator Grant to D. Zoccolan (project n. 616803-LEARN2SEE) and a FSE PORRegione autonoma FVG Program Grant (HEaD - Higher education anddevelopment) to W. Vanzella.

DISCLOSURES

No conflicts of interest, financial or otherwise, are declared by the authors.

AUTHOR CONTRIBUTIONS

W.V., N.G., D.B., and D.Z. conceived and designed research; W.V., N.G.,D.B., A.P., and M.G. performed experiments; W.V., N.G., D.B., A.P., andM.G. analyzed data; W.V., N.G., D.B., M.G., and D.Z. interpreted results ofexperiments; W.V., N.G., and D.Z. drafted manuscript; W.V., N.G., and D.Z.edited and revised manuscript; N.G., D.B., M.G., and D.Z. prepared figures;D.Z. approved final version of manuscript.

REFERENCES

Acharya L, Aghajan ZM, Vuong C, Moore JJ, Mehta MR. Causal influenceof visual cues on hippocampal directional selectivity. Cell 164: 197–207,2016. doi:10.1016/j.cell.2015.12.015.

Alemi-Neissi A, Rosselli FB, Zoccolan D. Multifeatural shape processing inrats engaged in invariant visual object recognition. J Neurosci 33: 5939–5956, 2013. doi:10.1523/JNEUROSCI.3629-12.2013.

Aoki R, Tsubota T, Goya Y, Benucci A. An automated platform for high-throughput mouse behavior and physiology with voluntary head-fixation. NatCommun 8: 1196, 2017. doi:10.1038/s41467-017-01371-0.

Aragão RS, Rodrigues MAB, de Barros KMFT, Silva SRF, Toscano AE,de Souza RE, Manhães-de-Castro R. Automatic system for analysis oflocomotor activity in rodents—A reproducibility study. J Neurosci Methods195: 216–221, 2011. doi:10.1016/j.jneumeth.2010.12.016.

Attanasi A, Cavagna A, Del Castello L, Giardina I, Grigera TS, Jelic A,Melillo S, Parisi L, Pohl O, Shen E, Viale M. Information transfer andbehavioural inertia in starling flocks. Nat Phys 10: 615–698, 2014. doi:10.1038/nphys3035.

Ayaz A, Saleem AB, Schölvinck ML, Carandini M. Locomotion controlsspatial integration in mouse visual cortex. Curr Biol 23: 890–894, 2013.doi:10.1016/j.cub.2013.04.012.

Bassett JP, Taube JS. Neural correlates for angular head velocity in the ratdorsal tegmental nucleus. J Neurosci 21: 5740–5751, 2001. doi:10.1523/JNEUROSCI.21-15-05740.2001.

Berg HC, Brown DA. Chemotaxis in Escherichia coli analysed by three-dimensional tracking. Nature 239: 500–504, 1972. doi:10.1038/239500a0.

Bergamasco F, Albarelli A, Torsello A. Image-space marker detection andrecognition using projective invariants. Visualization and Transmission2011 International Conference on 3D Imaging, Modeling, Processing.2011: 381–388, 2011. doi:10.1109/3DIMPVT.2011.55.

Bossens C, Op de Beeck HP. Linear and non-linear visual feature learning inrat and humans. Front Behav Neurosci 10: 235, 2016. doi:10.3389/fnbeh.2016.00235.

Canny J. A computational approach to edge detection. In: Readings inComputer Vision, edited by Fischler MA, Firschein O. San Francisco, CA:Morgan Kaufmann, 1987, p. 184–203.

Cavagna A, Conti D, Creato C, Del Castello LD, Giardina I, Grigera TS,Melillo S, Parisi L, Viale M. Dynamic scaling in natural swarms. Nat Phys13: 914–918, 2017. doi:10.1038/nphys4153.

Dadarlat MC, Stryker MP. Locomotion enhances neural encoding of visualstimuli in mouse V1. J Neurosci 37: 3764–3775, 2017. doi:10.1523/JNEUROSCI.2728-16.2017.

Davidson TJ, Kloosterman F, Wilson MA. Hippocampal replay of extendedexperience. Neuron 63: 497–507, 2009. doi:10.1016/j.neuron.2009.07.027.

De Keyser R, Bossens C, Kubilius J, Op de Beeck HP. Cue-invariant shaperecognition in rats as tested with second-order contours. J Vis 15: 14, 2015.doi:10.1167/15.15.14.

Dhawale AK, Poddar R, Wolff SB, Normand VA, Kopelowitz E, ÖlveczkyBP. Automated long-term recording and analysis of neural activity inbehaving animals. eLife 6: e27702, 2017. doi:10.7554/eLife.27702.

Djurdjevic V, Ansuini A, Bertolini D, Macke JH, Zoccolan D. Accuracy ofrats in discriminating visual objects is explained by the complexity of theirperceptual strategy. Curr Biol 28: 1005–1015.e5, 2018. doi:10.1016/j.cub.2018.02.037.

Duda RO, Hart PE, Stork DG. Pattern Classification. New York: Wiley,2001.

Finkelstein A, Derdikman D, Rubin A, Foerster JN, Las L, Ulanovsky N.Three-dimensional head-direction coding in the bat brain. Nature 517:159–164, 2015. doi:10.1038/nature14031.

Fiore PD. Efficient linear solution of exterior orientation. IEEE Trans PatternAnal Mach Intell 23: 140–148, 2001. doi:10.1109/34.908965.

Fyhn M, Molden S, Witter MP, Moser EI, Moser M-B. Spatial represen-tation in the entorhinal cortex. Science 305: 1258–1264, 2004. doi:10.1126/science.1099901.

Garcia-Perez E, Mazzoni A, Zoccolan D, Robinson HP, Torre V. Statisticsof decision making in the leech. J Neurosci 25: 2597–2608, 2005. doi:10.1523/JNEUROSCI.3808-04.2005.

Garro V, Crosilla F, Fusiello A. Solving the PnP problem with anisotropicorthogonal Procrustes analysis. Visualization Transmission 2012 SecondInternational Conference on 3D Imaging, Modeling, Processing. 2012:262–269, 2012. doi:10.1109/3DIMPVT.2012.40.

González RC, Woods RE. Digital Image Processing. Upper Saddle River,NJ: Pearson/Prentice Hall, 2008.

Gower JC, Dijksterhuis GB. Procrustes Problems. Oxford, UK: OxfordUniversity Press, 2004.

Grauman K, Leibe B. Visual Object Recognition. San Rafael, CA: Morgan &Claypool, 2011.

Guitchounts G, Masis J, Wolff SB, Cox DD. Head direction and orienting-suppression signals in primary visual cortex of freely moving rats. Compu-tational and Systems Neuroscience (Cosyne) 2019. Lisbon, Portugal, Feb-ruary 28–March 5, 2019.

Hafting T, Fyhn M, Molden S, Moser M-B, Moser EI. Microstructure of aspatial map in the entorhinal cortex. Nature 436: 801–806, 2005. doi:10.1038/nature03721.

Harris C, Stephens M. A combined corner and edge detector. In: Proceedingsof the Alvey Vision Conference, edited by Taylor CJ. Manchester, UK:Alvety Vision Club, 1988, p. 23.1–23.6.

Hauser MFA, Wiescholleck V, Colitti-Klausnitzer J, Bellebaum C, Ma-nahan-Vaughan D. Event-related potentials evoked by passive visuospatialperception in rats and humans reveal common denominators in informationprocessing. Brain Struct Funct 224: 1583–1597, 2019. doi:10.1007/s00429-019-01854-4.




Horner AE, Heath CJ, Hvoslef-Eide M, Kent BA, Kim CH, Nilsson SRO,Alsiö J, Oomen CA, Holmes A, Saksida LM, Bussey TJ. The touchscreenoperant platform for testing learning and memory in rats and mice. NatProtoc 8: 1961–1984, 2013. doi:10.1038/nprot.2013.122.

Insafutdinov E, Pishchulin L, Andres B, Andriluka M, Schiele B.DeeperCut: a deeper, stronger, and faster multi-person pose estimationmodel. In: Computer Vision—ECCV 2016, edited by Leibe B, Matas J,Sebe N, Welling M. Cham, Switzerland: Springer International, 2016, p.34 –50.

Jähne B. Digital Image Processing. Berlin, Germany: Springer Science &Business Media, 2005.

Kaliukhovich DA, Op de Beeck H. Hierarchical stimulus processing in rodentprimary and lateral visual cortex as assessed through neuronal selectivity andrepetition suppression. J Neurophysiol 120: 926–941, 2018. doi:10.1152/jn.00673.2017.

Kemp A, Manahan-Vaughan D. Passive spatial perception facilitates theexpression of persistent hippocampal long-term depression. Cereb Cortex22: 1614–1621, 2012. doi:10.1093/cercor/bhr233.

Kimmel DL, Mammo D, Newsome WT. Tracking the eye non-invasively:simultaneous comparison of the scleral search coil and optical trackingtechniques in the macaque monkey. Front Behav Neurosci 6: 49, 2012.doi:10.3389/fnbeh.2012.00049.

Knutsen PM, Derdikman D, Ahissar E. Tracking whisker and head move-ments in unrestrained behaving rodents. J Neurophysiol 93: 2294–2301,2005. doi:10.1152/jn.00718.2004.

Kropff E, Carmichael JE, Moser M-B, Moser EI. Speed cells in the medialentorhinal cortex. Nature 523: 419–424, 2015. doi:10.1038/nature14622.

Kurnikova A, Moore JD, Liao S-M, Deschênes M, Kleinfeld D. Coordina-tion of orofacial motor actions into exploratory behavior by rat. Curr Biol27: 688–696, 2017. doi:10.1016/j.cub.2017.01.013.

Kurylo DD, Chung C, Yeturo S, Lanza J, Gorskaya A, Bukhari F. Effectsof contrast, spatial frequency, and stimulus duration on reaction time in rats.Vision Res 106: 20–26, 2015. doi:10.1016/j.visres.2014.10.031.

Kurylo DD, Yeturo S, Lanza J, Bukhari F. Lateral masking effects oncontrast sensitivity in rats. Behav Brain Res 335: 1–7, 2017. doi:10.1016/j.bbr.2017.07.046.

Lepetit V, Moreno-Noguer F, Fua P. EPnP: an accurate O(n) solution to thePnP problem. Int J Comput Vis 81: 155–166, 2009. doi:10.1007/s11263-008-0152-6.

Licata AM, Kaufman MT, Raposo D, Ryan MB, Sheppard JP, Church-land AK. Posterior parietal cortex guides visual decisions in rats. J Neurosci37: 4954–4966, 2017. doi:10.1523/JNEUROSCI.0105-17.2017.

Lindeberg T. Feature detection with automatic scale selection. Int J ComputVis 30: 79–116, 1998. doi:10.1023/A:1008045108935.

Lowe D. Fitting parameterized three-dimensional models to images. IEEETrans Pattern Anal Mach Intell 13: 441– 450, 1991. doi:10.1109/34.134043.

Lowe DG. Distinctive image features from scale-invariant keypoints. Int JComput Vis 60: 91–110, 2004. doi:10.1023/B:VISI.0000029664.99615.94.

Mar AC, Horner AE, Nilsson SR, Alsiö J, Kent BA, Kim CH, Holmes A,Saksida LM, Bussey TJ. The touchscreen operant platform for assessingexecutive function in rats and mice. Nat Protoc 8: 1985–2005, 2013.doi:10.1038/nprot.2013.123.

Mathis A, Mamidanna P, Cury KM, Abe T, Murthy VN, Mathis MW,Bethge M. DeepLabCut: markerless pose estimation of user-defined bodyparts with deep learning. Nat Neurosci 21: 1281–1289, 2018. doi:10.1038/s41593-018-0209-y.

Matteucci G, Bellacosa Marotti R, Riggi M, Rosselli FB, Zoccolan D.Nonlinear processing of shape information in rat lateral extrastriatecortex. J Neurosci 39: 1649 –1670, 2019. doi:10.1523/JNEUROSCI.1938-18.2018.

Mazzoni A, Garcia-Perez E, Zoccolan D, Graziosi S, Torre V. Quantitativecharacterization and classification of leech behavior. J Neurophysiol 93:580–593, 2005. doi:10.1152/jn.00608.2004.

Meier P, Flister E, Reinagel P. Collinear features impair visual detection byrats. J Vis 11: 22, 2011. doi:10.1167/11.3.22.

Mersch DP, Crespi A, Keller L. Tracking individuals shows spatial fidelityis a key regulator of ant social organization. Science 340: 1090–1093, 2013.doi:10.1126/science.1234316.

Mikolajczyk K, Schmid C. An affine invariant interest point detector. In:Computer Vision — ECCV 2002, edited by Heyden A, Sparr G, Nielsen M,Johansen P. Berlin, Germany: Springer, 2002, p. 128–142.

More J. The Levenberg-Marquardt algorithm: implementation and theory. In:Lecture Notes in Mathematics 630: Numerical Analysis, edited by WatsonG. Berlin, Germany: Springer, 1977, p. 105–116.

Moser EI, Kropff E, Moser M-B. Place cells, grid cells, and the brain’sspatial representation system. Annu Rev Neurosci 31: 69–89, 2008. doi:10.1146/annurev.neuro.31.061307.090723.

Moser M-B, Rowland DC, Moser EI. Place cells, grid cells, and memory.Cold Spring Harb Perspect Biol 7: a021808, 2015. doi:10.1101/cshperspect.a021808.

Naidu DK, Fisher RB. A Comparative analysis of algorithms for determiningthe peak position of a stripe to sub-pixel accuracy. In: BMVC91: Proceed-ings of the British Machine Vision Conference, edited by Mowforth P.London, UK: Springer, 1991, p. 217–225.

Niell CM, Stryker MP. Modulation of visual responses by behavioral state inmouse visual cortex. Neuron 65: 472–479, 2010. doi:10.1016/j.neuron.2010.01.033.

Nikbakht N, Tafreshiha A, Zoccolan D, Diamond ME. Supralinear andsupramodal integration of visual and tactile signals in rats: psychophysicsand neuronal mechanisms. Neuron 97: 626–639.e8, 2018. doi:10.1016/j.neuron.2018.01.003.

O’Keefe J, Dostrovsky J. The hippocampus as a spatial map. Preliminaryevidence from unit activity in the freely-moving rat. Brain Res 34: 171–175,1971. doi:10.1016/0006-8993(71)90358-1.

O’Keefe J, Nadel L. The Hippocampus as a Cognitive Map. Oxford, UK:Oxford University Press, 1978.

Odoemene O, Pisupati S, Nguyen H, Churchland AK. Visual evidenceaccumulation guides decision-making in unrestrained mice. J Neurosci 38:10143–10155, 2018. doi:10.1523/JNEUROSCI.3478-17.2018.

Pasquet MO, Tihy M, Gourgeon A, Pompili MN, Godsil BP, Léna C,Dugué GP. Wireless inertial measurement of head kinematics in freely-moving rats. Sci Rep 6: 35689, 2016. doi:10.1038/srep35689.

Payne HL, Raymond JL. Magnetic eye tracking in mice. eLife 6: e29222,2017. doi:10.7554/eLife.29222.

Perkon I, Košir A, Itskov PM, Tasic J, Diamond ME. Unsupervisedquantification of whisking and head movement in freely moving rodents. JNeurophysiol 105: 1950–1962, 2011. doi:10.1152/jn.00764.2010.

Pinnell RC, Almajidy RK, Kirch RD, Cassel JC, Hofmann UG. A wirelessEEG recording method for rat use inside the water maze. PLoS One 11:e0147730, 2016. doi:10.1371/journal.pone.0147730.

Quiroga RQ, Nadasdy Z, Ben-Shaul Y. Unsupervised spike detection andsorting with wavelets and superparamagnetic clustering. Neural Comput 16:1661–1687, 2004. doi:10.1162/089976604774201631.

Raposo D, Kaufman MT, Churchland AK. A category-free neural popula-tion supports evolving demands during decision-making. Nat Neurosci 17:1784–1792, 2014. doi:10.1038/nn.3865.

Redish AD. Vicarious trial and error. Nat Rev Neurosci 17: 147–159, 2016.doi:10.1038/nrn.2015.30.

Remmel RS. An inexpensive eye movement monitor using the scleral searchcoil technique. IEEE Trans Biomed Eng 31: 388–390, 1984. doi:10.1109/TBME.1984.325352.

Rigosa J, Lucantonio A, Noselli G, Fassihi A, Zorzin E, Manzino F,Pulecchi F, Diamond ME. Dye-enhanced visualization of rat whiskersfor behavioral studies. eLife 6: e25290, 2017. doi:10.7554/eLife.25290.

Rosselli FB, Alemi A, Ansuini A, Zoccolan D. Object similarity affects theperceptual strategy underlying invariant visual object recognition in rats.Front Neural Circuits 9: 10, 2015. doi:10.3389/fncir.2015.00010.

Saleem AB, Ayaz A, Jeffery KJ, Harris KD, Carandini M. Integration ofvisual motion and locomotion in mouse visual cortex. Nat Neurosci 16:1864–1869, 2013. doi:10.1038/nn.3567.

Saleem AB, Diamanti EM, Fournier J, Harris KD, Carandini M. Coherentencoding of subjective spatial position in visual cortex and hippocampus.Nature 562: 124–127, 2018. doi:10.1038/s41586-018-0516-1.

Sargolini F, Fyhn M, Hafting T, McNaughton BL, Witter MP, MoserM-B, Moser EI. Conjunctive representation of position, direction, andvelocity in entorhinal cortex. Science 312: 758–762, 2006. doi:10.1126/science.1125572.

Sawinski J, Wallace DJ, Greenberg DS, Grossmann S, Denk W, KerrJND. Visually evoked activity in cortical cells imaged in freely movinganimals. Proc Natl Acad Sci USA 106: 19557–19562, 2009. doi:10.1073/pnas.0903680106.

Stackman RW, Taube JS. Firing properties of rat lateral mammillary singleunits: head direction, head pitch, and angular head velocity. J Neurosci 18:9020–9037, 1998. doi:10.1523/JNEUROSCI.18-21-09020.1998.




Stahl JS, van Alphen AM, De Zeeuw CI. A comparison of video andmagnetic search coil recordings of mouse eye movements. J NeurosciMethods 99: 101–110, 2000. doi:10.1016/S0165-0270(00)00218-1.

Stirman J, Townsend LB, Smith S. A touchscreen based global motionperception task for mice. Vision Res 127: 74–83, 2016. doi:10.1016/j.visres.2016.07.006.

Szuts TA, Fadeyev V, Kachiguine S, Sher A, Grivich MV, Agrochão M,Hottowy P, Dabrowski W, Lubenov EV, Siapas AG, Uchida N, LitkeAM, Meister M. A wireless multi-channel neural amplifier for freelymoving animals. Nat Neurosci 14: 263–269, 2011. doi:10.1038/nn.2730.

Tafazoli S, Di Filippo A, Zoccolan D. Transformation-tolerant object recog-nition in rats revealed by visual priming. J Neurosci 32: 21–34, 2012.doi:10.1523/JNEUROSCI.3932-11.2012.

Tafazoli S, Safaai H, De Franceschi G, Rosselli FB, Vanzella W, Riggi M,Buffolo F, Panzeri S, Zoccolan D. Emergence of transformation-tolerantrepresentations of visual objects in rat lateral extrastriate cortex. eLife 6:e22794, 2017. doi:10.7554/eLife.22794.

Taube JS. The head direction signal: origins and sensory-motor integration.Annu Rev Neurosci 30: 181–207, 2007. doi:10.1146/annurev.neuro.29.051605.112854.

Tort ABL, Neto WP, Amaral OB, Kazlauckas V, Souza DO, Lara DR. Asimple webcam-based approach for the measurement of rodent locomotionand other behavioural parameters. J Neurosci Methods 157: 91–97, 2006.doi:10.1016/j.jneumeth.2006.04.005.

Tsoar A, Nathan R, Bartan Y, Vyssotski A, Dell’Omo G, Ulanovsky N.Large-scale navigational map in a mammal. Proc Natl Acad Sci USA 108:E718–E724, 2011. doi:10.1073/pnas.1107365108.

Vermaercke B, Op de Beeck HP. A multivariate approach reveals thebehavioral templates underlying visual discrimination in rats. Curr Biol 22:50–55, 2012. doi:10.1016/j.cub.2011.11.041.

Vinken K, Van den Bergh G, Vermaercke B, Op de Beeck HP. Neuralrepresentations of natural and scrambled movies progressively change from

rat striate to temporal cortex. Cereb Cortex 26: 3310–3322, 2016. doi:10.1093/cercor/bhw111.

Vinken K, Vermaercke B, Op de Beeck HP. Visual categorization of naturalmovies by rats. J Neurosci 34: 10645–10658, 2014. doi:10.1523/JNEURO-SCI.3663-13.2014.

Vinken K, Vogels R, Op de Beeck H. Recent visual experience shapes visualprocessing in rats through stimulus-specific adaptation and response en-hancement. Curr Biol 27: 914–919, 2017. doi:10.1016/j.cub.2017.02.024.

Wallace DJ, Greenberg DS, Sawinski J, Rulla S, Notaro G, Kerr JND.Rats maintain an overhead binocular field at the expense of constant fusion.Nature 498: 65–69, 2013. doi:10.1038/nature12153.

Yartsev MM, Ulanovsky N. Representation of three-dimensional space in thehippocampus of flying bats. Science 340: 367–372, 2013. doi:10.1126/science.1235338.

Yu Y, Hira R, Stirman JN, Yu W, Smith IT, Smith SL. Mice use robust andcommon strategies to discriminate natural scenes. Sci Rep 8: 1379, 2018.doi:10.1038/s41598-017-19108-w.

Zhang Z. A flexible new technique for camera calibration. IEEE Trans PatternAnal Mach Intell 22: 1330–1334, 2000. doi:10.1109/34.888718.

Zoccolan D. Invariant visual object recognition and shape processing in rats.Behav Brain Res 285: 10–33, 2015. doi:10.1016/j.bbr.2014.12.053.

Zoccolan D, Di Filippo A. Methodological approaches to the behaviouralinvestigation of visual perception in rodents. In: Handbook of ObjectNovelty Recognition, edited by Ennaceur A, de Souza Silva MA. Amster-dam, Netherlands: Elsevier, 2018, chapt. 5, p. 69–101. Handbook ofBehavioral Neuroscience 27.

Zoccolan D, Graham BJ, Cox DD. A self-calibrating, camera-based eyetracker for the recording of rodent eye movements. Front Neurosci 4: 193,2010. doi:10.3389/fnins.2010.00193.

Zoccolan D, Oertelt N, DiCarlo JJ, Cox DD. A rodent model for the studyof invariant visual object recognition. Proc Natl Acad Sci USA 106:8748–8753, 2009. doi:10.1073/pnas.0811583106.




a passive, camera-based head-tracking system for real-time...

Documents