open surf

Notes on the OpenSURF LibraryChristopherEvansJanuary18,20091 SURF:SpeededUpRobustFeaturesInthisdocument,theSURFdetector-descriptorschemeusedintheOpenSURFlibraryisdiscussedindetail. Firstthealgorithmisanalysedfromatheoretical standpointtoprovideadetailedoverviewofhowandwhyitworks. Nextthedesignanddevelopmentchoices for theimplementationof thelibraryarediscussedandjustied. Duringtheimplementation of the library, it was found that some of the ner details of the algorithmhadbeenomittedoroverlooked,soSection1.5servestomakecleartheconceptswhicharenotexplicitlydenedintheSURFpaper[1].1.1 IntegralImagesMuch of the performance increase in SURF can be attributed to the use of an intermediateimage representation known as the Integral Image [7]. The integral image is computedrapidlyfromaninput image andis usedtospeedupthe calculationof anyuprightrectangulararea. GivenaninputimageI andapoint(x, y)theintegral imageI

iscalculatedbythesumofthevaluesbetweenthepointandtheorigin. Formallythiscanbedenedbytheformula:I

(x, y) =ix

i=0jy

j=0I(x, y) (1)Usingtheintegral image, thetaskof calculatingtheareaof anuprightrectangularregionisreducedfouroperations. IfweconsiderarectangleboundedbyverticesA, B,CandDasinFigure1,thesumofpixelintensitiesiscalculatedby:

= A + D (C + B) (2)1Figure1: Area computation using integral imagesSincecomputationtimeisinvarianttochangeinsizethisapproachisparticularlyusefulwhenlargeareasarerequired. SURFmakesgooduseof thispropertytoperformfastconvolutionsofvaryingsizeboxltersatnearconstanttime.1.2 Fast-HessianDetector1.2.1 TheHessianThe SURFdetector is basedonthe determinant of the Hessianmatrix. Inorder tomotivatetheuseoftheHessian,weconsideracontinuousfunctionoftwovariablessuchthat the value of the function at (x, y) is given by f(x, y). The Hessian matrix, H, is thematrixofpartialderivatesofthefunctionf.H(f(x, y)) =_2fx22fxy2fxy2fy2_(3)Thedeterminantofthismatrix,knownasthediscriminant,iscalculatedby:det(H) =2fx22fy2 _2fxy_2(4)The value of the discriminant is usedtoclassifythe maximaandminimaof thefunctionbythesecondorder derivativetest. Sincethedeterminant istheproduct ofeigenvaluesoftheHessianwecanclassifythepointsbasedonthesignoftheresult. Ifthe determinant is negative then the eigenvalues have dierent signs and hence the pointis not a local extremum;if it is positive then either both eigenvalues are positive or botharenegativeandineithercasethepointisclassiedasanextremum.Translatingthistheorytoworkwithimagesratherthanacontinuousfunctionisafairly trivial task. First we replace the function values f(x, y) by the image pixel intensi-ties I(x, y). Next we require a method to calculate the second order partial derivatives oftheimage. Wecancalculatethederivativesbyconvolutionwithanappropriatekernel.Inthecaseof SURFthesecondorderscalenormalisedGaussianisthechosenlterasitallowsforanalysisoverscalesaswellasspace(scale-spacetheoryisdiscussedfurtherlaterinthissection). WecanconstructkernelsfortheGaussianderivativesinx,yandcombined xydirection such that we calculate the four entries of the Hessian matrix. UseoftheGaussianallowsustovarytheamountofsmoothingduringtheconvolutionstageso that the determinant is calculated at dierent scales. Furthermore, since the Gaussianis an isotropic (i.e. circularly symmetric) function, convolution with the kernel allows forrotationinvariance. WecannowcalculatetheHessianmatrix, H, asfunctionof bothspacex = (x, y)andscale.H(x, ) =_Lxx(x, ) Lxy(x, )Lxy(x, ) Lyy(x, )_, (5)Here Lxx(x, ) refers to the convolution of the second order Gaussian derivative2g()x2withtheimageatpointx=(x, y)andsimilarlyforLyyandLxy. ThesederivativesareknownasLaplacianofGaussians.WorkingfromthiswecancalculatethedeterminantoftheHessianforeachpixelinthe image and use the value to nd interest points. This variation of the Hessian detectorissimilartothatproposedbyBeaudet[2].Lowe [4] found performance increase in approximating the Laplacian of Gaussians byadierenceof Gaussians. Inasimiliarmanner, Bay[1] proposedanapproximationtotheLaplacianofGaussiansbyusingboxlterrepresentationsoftherespectivekernels.Figure2illustratesthesimilaritybetweenthediscretisedandcroppedkernelsandtheirboxltercounterparts. Considerableperformanceincreaseisfoundwhentheseltersareusedinconjunctionwiththeintegral imagedescribedinSection1.1. Toqauntifythedierencewecanconsiderthenumberof arrayaccessesandoperationsrequiredintheconvolution. Fora9 9lterwewouldrequire81arrayaccessesandoperationsfortheoriginalrealvaluedlterandonly8fortheboxlterrepresentation. Asthelterisincreasedinsize, thecomputationcostincreasessignicantlyfortheoriginal Laplacianwhilethecostfortheboxltersisindependentofsize.InFigure2theweightsappliedtoeachoftheltersectionsiskeptsimple. FortheDxylter the black regions are weighted with a value of 1, the white regions with a valueof-1andtheremainingareasnotweightedatall. TheDxxandDyyltersareweightedsimilarlybut withthewhiteregionshaveavalueof -1andtheblackwithavalueof2. Simpleweightingallowsforrapidcalculationof areasbutinusingtheseweightsweneedtoaddressthedierenceinresponsevaluesbetweentheoriginalandapproximatedkernels. Bay[1] proposes thefollowingformulaas anaccurateapproximationfor theFigure 2: Laplacian of Gaussian Approximation. Top Row:The discretised and cropped secondorderGaussianderivativesinthex,yandxy-directions. WerefertotheseasLxx, Lyy, Lxy.BottomRow: WeightedBoxlterapproximationsinthex, yandxy-directions. Werefertothese asDxx,Dyy,DxyHessiandeterminantusingtheapproximatedGaussians:det(Happrox) = DxxDyy (0.9Dxy)2(6)In[1] thetwoltersarecomparedindetail andtheresultsconcludethattheboxrepresentations negligible loss in accuracy is far outweighed by the considerable increaseineciencyandspeed. Thedeterminant hereis referredtoas theblobresponseatlocationx = (x, y, ). Thesearchforlocalmaximaofthisfunctionoverbothspaceandscaleyieldstheinterestpointsforanimage. Theexactmethodforextractinginterestpointsisexplainedinthefollowingsection.1.2.2 ConstructingtheScale-SpaceIn order to detect interest points using the determinant of Hessian it is rst necessary tointroduce the notion of a scale-space. A scale-space is a continuous function which can beusedtondextremaacrossallpossiblescales[8]. Incomputervisionthescale-spaceistypically implemented as an image pyramid where the input image is iteratively convolvedwith Gaussian kernel and repeatedly sub-sampled (reduced in size). This method is usedtogreateectinSIFT[4] butsinceeachlayerreliesontheprevious, andimagesneedtoberesizeditisnotcomputationallyecient. Astheprocessingtimeof thekernelsusedinSURFis sizeinvariant, thescale-spacecanbecreatedbyapplyingkernels ofincreasingsizetotheoriginal image. Thisallowsformultiplelayersof thescale-spacepyramidtobeprocessedsimultaneouslyandnegatestheneedtosubsampletheimagehenceprovidingperformanceincrease. Figure3illustrates thedierencebetweenthetraditionalscale-spacestructureandtheSURFcounterpart.Figure3: FilterPyramid. Thetraditionalapproachtoconstructingascale-space(left).TheimagesizeisvariedandtheGuassianlterisrepeatedlyappliedtosmoothsubse-quent layers. The SURF approach (right) leaves the original image unchanged and variesonlytheltersize.The scale-space is divided into a number of octaves, where an octave refers to a seriesofresponsemapsofcoveringadoublingofscale. InSURFthelowestlevelofthescale-space is obtained from the output of the 9 9 lters shown in 2. These lters correspondtoareal valuedGaussianwith=1.2. Subsequentlayersareobtainedbyupscalingthelterswhilstmaintainingthesamelterlayoutratio. Astheltersizeincreasessotoo does the value of the associated Gaussian scale, and since ratios of the layout remainconstantwecancalculatethisscalebytheformula:approx= CurrentFilterSize BaseFilterScaleBaseFilterSize= CurrentFilterSize 1.29Whenconstructinglargerlters, thereareanumberoffactorswhichmustbetakeintoconsideration. The increase in size is restricted by the length of the positive and negativelobesof theunderlyingsecondorderGaussianderivatives. Intheapproximatedltersthelobesizeissetatonethirdthesidelengthofthelterandreferstotheshortersidelength of the weighted black and white regions. Since we require the presence of a centralpixel,thedimensionsmustbeincreasedequallyaroundthislocationandhencethelobesize can increase by a minimum of 2. Since there are three lobes in each lter which mustbethesamesize,thesmalleststepsizebetweenconsecutiveltersis6. FortheDxxandDyyltersthelongersidelengthoftheweightedregionsincreasesby2oneachsidetopreserve structure. Figure 4 illustrates the structure of the lters as they increase in size.1.2.3 AccurateInterestPointLocalisationThe task of localising the scale and rotation invariant interest points in the image can bedividedintothreesteps. Firsttheresponsesarethresholdedsuchthatall valuesbelowFigure4: FilterStructure. Subsequentlterssizesmustdierbyaminimumof 6topreservelterstructure.thepredeterminedthresholdareremoved. Increasingthethresholdlowersthenumberofdetectedinterestpoints, leavingonlythestrongestwhiledecreasingallowsformanymoretodetected. Thereforethethresholdcanbeadaptedtotailorthedetectiontotheapplication.After thresholding, a non-maximal suppression is performed to nd a set of candidatepoints. To do this each pixel in the scale-space is compared to its 26 neighbours, comprisedof the 8 points in the native scale and the 9 in each of the scales above and below. Figure5illustratesthenon-maximal suppressionstep. Atthisstagewehaveasetof interestpoints with minimum strength determined by the threshold value and which are also localmaxima/minimainthescale-space.Figure5: Non-MaximalSuppression. ThepixelmarkedXisselectedasamaximaifitgreaterthanthesurroundingpixelsonitsintervalandintervalsaboveandbelow.Thenal stepinlocalisingthepointsinvolvesinterpolatingthenearbydatatondthelocationinbothspaceandscaletosub-pixelaccuracy. Thisisdonebyttinga3DquadraticasproposedbyBrown[3]. InordertodothisweexpressthedeterminantoftheHessianfunction, H(x, y, ), asaTaylorexpansionuptoquadratictermscenteredatdetectedlocation. Thisisexpressedas:H(x) = H +HxTx +12xT 2Hx2 x (7)The interpolatedlocationof the extremum, x=(x, y, ), is foundbytakingthederivativeofthisfunctionandsettingittozerosuchthat: x = 2Hx21Hx(8)Thederivativeshereareapproximatedbynitedierencesofneighbouringpixels. If xisgreaterthan0.5inthex, yordirectionsweadjustthelocationandperformtheinterpolationagain. Thisprocedureisrepeateduntil xislessthan0.5inall directionsor the the number of predetermined interpolation steps has been exceeded. Those pointswhichdonotconvergearedroppedfromthesetofinterestpointsleavingonlythemoststableandrepeatable.1.3 InterestPointDescriptorTheSURFdescriptordescribeshowthepixel intensitiesaredistributedwithinascaledependentneighbourhoodofeachinterestpointdetectedbytheFast-Hessian. Thisap-proachissimilartothatofSIFT[4]butintegralimagesusedinconjunctionwithltersknownasHaarwaveletsareusedinordertoincreaserobustnessanddecreasecomputa-tiontime. Haarwaveletsaresimplelterswhichcanbeusedtondgradientsinthexandydirections.Figure 6: Haar Wavelets. The left lter computes the response in the x-direction and the rightthe y-direction. Weights are 1 for black regions and -1 for the white. When used with integralimages each wavelet requires just six operations to compute.Extractionof thedescriptor canbedividedintotwodistinct tasks. First eachin-terestpointisassignedareproducibleorientationbeforeascaledependentwindowisconstructedinwhicha64-dimensional vectorisextracted. Itisimportantthatall cal-culationsforthedescriptorarebasedonmeasurementsrelativetothedetectedscaleinorder toachievescaleinvariant results. Theprocedurefor extracingthedescriptor isexplainedfurtherinthefollowing.1.3.1 OrientationAssignmentIn order to achieve invariance to image rotation each detected interest point is assigned areproducibleorientation. Extractionofthedescriptorcomponentsisperformedrelativetothisdirectionsoitisimportantthatthisdirectionisfoundtoberepeatableundervaryingconditions. Todeterminetheorientation,Haarwaveletresponsesofsize4arecalculatedforasetpixelswithinaradiusof6ofthedetectedpoint,wherereferstothescaleatwhichthepointwasdetected. Thespecicsetof pixelsisdeterminedbysamplingthosefromwithinthecircleusingastepsizeof.The responses are weighted with a Gaussian centered at the interest point. In keepingwiththeresttheGaussianisdependentonthescaleof thepointandchosentohavestandard deviation 2.5. Once weighted the responses are represented as points in vectorspace, withthex-responsesalongtheabscissaandthey-responsesalongtheordinate.Thedomimantorientationisselectedbyrotatingacirclesegmentcoveringanangleof3aroundtheorigin. At eachposition, thexandy-responseswithinthesegment aresummedandusedtoformanewvector. Thelongest vector lendsits orientationtheinterestpoint. ThisprocessisillustratedinFigure7.Figure7: OrientationAssignment: Asthewindowslidesaroundtheoriginthecomponentsof the responses are summed to yield the vectors shown here in blue. The largest such vectordetermines the dominant orientation.Insomeapplications,rotationinvarianceinnotrequiredsothisstepcanbeomitted,hence providingfurther performance increase. In[1] this versionof the descriptor isreeredtoasUprightSURF(orU-SURF)andhasbeenshowntomaintainrobustnessforimagerotationsofupto+/ 15 deg.1.3.2 DescriptorComponentsThe rst step in extracting the SURF descriptor is to construct a square window aroundthe interest point. This window contains the pixels which will form entries in the descrip-tor vector and is of size 20, again where refers to the detected scale. Furthermore thewindowisorientedalongthedirectionfoundinSection1.3.1suchthatall subsequentcalculationsarerelativetothisdirection.Figure8: DescriptorWindows. Thewindowsizeis20timesthescaleofthedetectedpointand is oriented along the dominant direction shown in green.Thedescriptorwindowisdividedinto4 4regularsubregions. WithineachofthesesubregionsHaar waveletsof size2arecalculatedfor 25regularlydistributedsamplepoints. Ifwerefertothexandywaveletresponsesbydxanddyrespectivelythenforthese25samplepoints(i.e. eachsubregion)wecollect,vsubregion=_

dx,

dy,

|dx|,

|dy|_. (9)Thereforeeachsubregioncontributesfourvaluestothedescriptorvectorleadingtoanoverallvectoroflength4 4 4 = 64. TheresultingSURFdescriptorisinvarianttorotation,scale,brightnessand,afterreductiontounitlength,contrast.Figure 9: Descriptor Components. The green square bounds one of the 16 subregions and bluecircles represent the sample points at which we compute the wavelet responses. As illustratedthe x and y responses are calculated relative to the dominant orientation.1.4 DesignThissectionoutlinesthedesignchoiceswhichhavebeenmadetoimplementtheSURFpointcorrespondencelibrary.1.4.1 LanguageandEnvironmentC++ has been chosen as the programming language to develop the SURF library for thefollowingreasons:1. Speed: Lowlevel imageprocessingneedstobefast andC++will facilitatetheimplementationofahighlyecientlibraryoffunctions.2. Usability: Inmyresearch, almostall imageprocessingappearstobecarriedoutinC++,CandMatlab. Oncecompletethelibraryistobefullydocumentedandfreelyavailable,soC++seemstheobviouschoiceinmakingausefulcontributiontotheeld.3. Portability: While C++ may not be entirely portable across platforms it is possible,by following strict standards, to write code which is portable across many platformsandcompilers.4. ImageProcessingLibraries: OpenCV1isalibraryof C++functionswhichlendsitselfwell toreal timecomputervision. Itprovidesfunctionalityforreadingdatafrom image les, video les as well as live video feeds direct from a webcam or othervision device. The library is well supported and works on both Linux and Windows.Thechosendevelopment environment for theimplementationof thelibraryis Mi-crosoftVisual C++Express20082. VC++isapowerful IDE(IntegratedDevelopmentEnvironment)allowingforeasycodecreationandvisualprojectorganisation. OpenCValsointegrateswellwiththecompilerandisfullysupported. AsbothOpenCVandVi-sual C++arefree, thiswill allowthenishedSURFlibrarytobedistributedwithoutlicensingrestrictions.1.4.2 ArchitectureDesignThearchitecturedesignprovidesatop-downdecompositionofthelibraryintomodulesandclasses.1Open Source Computer Vision Library. Provides a simple API for working with images and videosin C++. Available from: http://opencvlibrary.sourceforge.net/2Free C++ IDE for Windows. Available from: www.microsoft.com/express/IntegralImageOverview: ModulecreatesandhandlesIntegralImagesInputs: AnImageProcesses: Createstheintegral imagerepresentationofsuppliedinputimage. Calcu-latespixelsumsoveruprightrectangularareas.Outputs: Theintegralimagerepresentation.Fast-HessianOverview: Findshessianbasedinterestpoints/regionsinagivenintegralimage.Inputs: Anintegralimagerepresentationofanimage.Processes: Builds determinant of Hessianresponse map. Performs anonmaximalsuppresiontolocaliseinterestpointsinascale-space. Interpolatesdetectedpointstosub-pixelaccuracy.Outputs: Avectorofaccuratelylocalisedinterestpoints.SURFDescriptor:Overview: Extractsdescriptorcomponentsforagivensetofdetectedinterestpoints.Inputs: Anintegralimagerepresentationofanimage,vectorofinterestpoints.Processes: CalculatesHaarwaveletresponses. Calculatesdominantorientationofaninterest point. Extracts 64-dimensional descriptor vector basedonsums of waveletresponses.Outputs: AvectorofSURFdescribedinterestpoints.InterestPoint:Overview: Storesdataassociatedwitheachindividualinterestpoint.Inputs: InterestPointdata.Processes: Accessor/MutatorMethodsfordata.Outputs: NoneUtilities:Overview: ModulecontainsallnonSURFspecicfunctions.1.4.3 DataStructuresThere are twoimportant non-standarddatastructures whichwill be includedintheimplementationoftheSURFlibrary. ThesearetheIplImagewhichisusedbyOpenCVtostoreimagedata, andtheuser denedIpoint whichwill storedataabout interestpoints. As the IplImage is supplied by an external library, it is not included in the designofthiswork. TheIpointdatatypeanditsinterfaceisdetailedinSection1.4.4.1.4.4 InterfaceDesignTheclass/functioninterfacesaredescribedforeachofthecomponentslistedinSection1.4.2, where the interface refers tothe public components whichcanbe calledfromexternalmodules.Listing1: IntegralImageInterface// ! Returns i nt e g r a l image r e pr e s e nt at i on of s uppl i e d imageIpl Image I nt e g r a l ( Ipl Image img ) ;// ! Cal c ul at e s sum of p i x e l i n t e n s i t i e s i n t he areaf l oat Area ( Ipl Image img , i nt x , i nt y , i nt width , i nt hei ght ) ;Listing2: Fast-HessianInterfaceFastHessi an {// ! Const ruct orFastHessi an ( Ipl Image img ,vector &i pt s, // Vector t o s t or e I poi nt si nt octaves, // Number of Oct aves t o anal ys ei nt i nt e r va l s, // Number of I nt e r v a l s per Octavei nt sample , // Sampl ing s t ep i n t he imagef l oat t hr e s ) ; // Hessi an response t hr e s hol d// ! Find t he image f e at ur e s and wr i t e i nt o vect or of f e at ur e svoid ge t I poi nt s ( ) ;};Listing3: SURFDescriptorInterfaceSur f {// ! Const ruct orSur f ( Ipl Image i nt e gr al i mg ) ;// ! Set t he I poi nt of i n t e r e s tvoid s e t I po i nt ( I poi nt i pt ) ;// ! Assi gn t he s uppl i e d I poi nt an or i e nt at i onvoid ge t Or i e nt at i on ( ) ;// ! Get t he de s c r i pt or vect or of t he provi ded I poi ntvoid ge t De s c r i pt or ( ) ;};Listing4: InterestPointInterfaceI poi nt {// ! Coordi nat es of t he de t e c t e d i nt e r e s t poi ntf l oat x , y ;// ! Det ect ed s c al ef l oat s c a l e ;// ! Or i ent at i on measured ant i c l oc k wi s e from+ve xax i sf l oat o r i e nt a t i o n ;// ! Vector of de s c r i pt or componentsf l oat de s c r i pt o r [ 6 4 ] ;};1.5 ImplementationBasedonthetheoreticalanalysisofSections1.2and1.3,andthedesigninSection1.4,theresultingimplementationof theSURFalgorithmispresented. ThissectionservestoprovidefurtherinsightintoSURFwhilstsimultaneouslymakingcleartheconceptswhich are not explicitly dened in [1]. For each component the methods which have beenusedaredemonstratedfromalgorithmicapproachwithexcerptsofcodeprovidedwherenecessary.1.5.1 IntegralImagesComputation of the Integral is a very simple process. We iterate over all rows and columnssuchthatthevalueintheintegralimageateachpointisthesumofpixelsaboveandtotheleft. Toimplementthisinanecientmannerwecalculateacumulativerowsumaswe move along each row. For the rst row this sum forms the value at each location in theresultingintegral image. Foreachsubsequentrowtheresultingvalueisthecumulativerowsumplusthevalueinthecellaboveintheintegralimage. InsimpliedC++codethisiswrittenas:Listing5: IntegralImageComputation// Loop over rows and c ol s of i nput imagefor ( i nt row=0; row

open surf

Documents