open surf

25
 Notes on the OpenSURF Library Christopher Evans January 18, 2009 1 SURF: Speed ed Up Robust F eatures In this document, the SURF detector-descriptor scheme used in the OpenSURF library is discussed in detail. First the algori thm is analysed from a theo reti cal standpoint to provide a detailed overview of how and why it works. Next the design and development choice s for the implementation of the lib rary are discus sed and justied. During the implementation of the library, it was found that some of the ner details of the algorithm had been omitted or overlooked, so Section 1.5 serves to make clear the concepts which are not explicitly dened in the SURF paper [1]. 1.1 Inte gr al Imag es Much of the performance increase in SURF can be attributed to the use of an intermediate image representation known as the “Integral Image” [7]. The integral image is computed rapidly from an input image and is used to speed up the calculation of any upright rectangu lar area. Given an input image I  and a point (x, y ) the integral image  I   is calculated by the sum of the values between the point and the origin. Formally this can be dened by the formula: I  (x, y ) = ix i=0  j y  j=0 I (x, y ) (1) Using the integral image, the task of calculating the area of an upright rectangular region is reduced four operations. If we cons ide r a rect angl e bounded by vertices A, B, C and D as in Figure 1, the sum of pixel intensities is calculated by:  =  A + D − (C  + B) (2) 1

Upload: shadab-khan

Post on 21-Jul-2015

21 views

Category:

Documents


0 download

TRANSCRIPT

Notes on the OpenSURF LibraryChristopherEvansJanuary18,20091 SURF:SpeededUpRobustFeaturesInthisdocument,theSURFdetector-descriptorschemeusedintheOpenSURFlibraryisdiscussedindetail. Firstthealgorithmisanalysedfromatheoretical standpointtoprovideadetailedoverviewofhowandwhyitworks. Nextthedesignanddevelopmentchoices for theimplementationof thelibraryarediscussedandjustied. Duringtheimplementation of the library, it was found that some of the ner details of the algorithmhadbeenomittedoroverlooked,soSection1.5servestomakecleartheconceptswhicharenotexplicitlydenedintheSURFpaper[1].1.1 IntegralImagesMuch of the performance increase in SURF can be attributed to the use of an intermediateimage representation known as the Integral Image [7]. The integral image is computedrapidlyfromaninput image andis usedtospeedupthe calculationof anyuprightrectangulararea. GivenaninputimageI andapoint(x, y)theintegral imageI

iscalculatedbythesumofthevaluesbetweenthepointandtheorigin. Formallythiscanbedenedbytheformula:I

(x, y) =ix

i=0jy

j=0I(x, y) (1)Usingtheintegral image, thetaskof calculatingtheareaof anuprightrectangularregionisreducedfouroperations. IfweconsiderarectangleboundedbyverticesA, B,CandDasinFigure1,thesumofpixelintensitiesiscalculatedby:

= A + D (C + B) (2)1Figure1: Area computation using integral imagesSincecomputationtimeisinvarianttochangeinsizethisapproachisparticularlyusefulwhenlargeareasarerequired. SURFmakesgooduseof thispropertytoperformfastconvolutionsofvaryingsizeboxltersatnearconstanttime.1.2 Fast-HessianDetector1.2.1 TheHessianThe SURFdetector is basedonthe determinant of the Hessianmatrix. Inorder tomotivatetheuseoftheHessian,weconsideracontinuousfunctionoftwovariablessuchthat the value of the function at (x, y) is given by f(x, y). The Hessian matrix, H, is thematrixofpartialderivatesofthefunctionf.H(f(x, y)) =_2fx22fxy2fxy2fy2_(3)Thedeterminantofthismatrix,knownasthediscriminant,iscalculatedby:det(H) =2fx22fy2 _2fxy_2(4)The value of the discriminant is usedtoclassifythe maximaandminimaof thefunctionbythesecondorder derivativetest. Sincethedeterminant istheproduct ofeigenvaluesoftheHessianwecanclassifythepointsbasedonthesignoftheresult. Ifthe determinant is negative then the eigenvalues have dierent signs and hence the pointis not a local extremum;if it is positive then either both eigenvalues are positive or botharenegativeandineithercasethepointisclassiedasanextremum.Translatingthistheorytoworkwithimagesratherthanacontinuousfunctionisafairly trivial task. First we replace the function values f(x, y) by the image pixel intensi-ties I(x, y). Next we require a method to calculate the second order partial derivatives oftheimage. Wecancalculatethederivativesbyconvolutionwithanappropriatekernel.Inthecaseof SURFthesecondorderscalenormalisedGaussianisthechosenlterasitallowsforanalysisoverscalesaswellasspace(scale-spacetheoryisdiscussedfurtherlaterinthissection). WecanconstructkernelsfortheGaussianderivativesinx,yandcombined xydirection such that we calculate the four entries of the Hessian matrix. UseoftheGaussianallowsustovarytheamountofsmoothingduringtheconvolutionstageso that the determinant is calculated at dierent scales. Furthermore, since the Gaussianis an isotropic (i.e. circularly symmetric) function, convolution with the kernel allows forrotationinvariance. WecannowcalculatetheHessianmatrix, H, asfunctionof bothspacex = (x, y)andscale.H(x, ) =_Lxx(x, ) Lxy(x, )Lxy(x, ) Lyy(x, )_, (5)Here Lxx(x, ) refers to the convolution of the second order Gaussian derivative2g()x2withtheimageatpointx=(x, y)andsimilarlyforLyyandLxy. ThesederivativesareknownasLaplacianofGaussians.WorkingfromthiswecancalculatethedeterminantoftheHessianforeachpixelinthe image and use the value to nd interest points. This variation of the Hessian detectorissimilartothatproposedbyBeaudet[2].Lowe [4] found performance increase in approximating the Laplacian of Gaussians byadierenceof Gaussians. Inasimiliarmanner, Bay[1] proposedanapproximationtotheLaplacianofGaussiansbyusingboxlterrepresentationsoftherespectivekernels.Figure2illustratesthesimilaritybetweenthediscretisedandcroppedkernelsandtheirboxltercounterparts. Considerableperformanceincreaseisfoundwhentheseltersareusedinconjunctionwiththeintegral imagedescribedinSection1.1. Toqauntifythedierencewecanconsiderthenumberof arrayaccessesandoperationsrequiredintheconvolution. Fora9 9lterwewouldrequire81arrayaccessesandoperationsfortheoriginalrealvaluedlterandonly8fortheboxlterrepresentation. Asthelterisincreasedinsize, thecomputationcostincreasessignicantlyfortheoriginal Laplacianwhilethecostfortheboxltersisindependentofsize.InFigure2theweightsappliedtoeachoftheltersectionsiskeptsimple. FortheDxylter the black regions are weighted with a value of 1, the white regions with a valueof-1andtheremainingareasnotweightedatall. TheDxxandDyyltersareweightedsimilarlybut withthewhiteregionshaveavalueof -1andtheblackwithavalueof2. Simpleweightingallowsforrapidcalculationof areasbutinusingtheseweightsweneedtoaddressthedierenceinresponsevaluesbetweentheoriginalandapproximatedkernels. Bay[1] proposes thefollowingformulaas anaccurateapproximationfor theFigure 2: Laplacian of Gaussian Approximation. Top Row:The discretised and cropped secondorderGaussianderivativesinthex,yandxy-directions. WerefertotheseasLxx, Lyy, Lxy.BottomRow: WeightedBoxlterapproximationsinthex, yandxy-directions. Werefertothese asDxx,Dyy,DxyHessiandeterminantusingtheapproximatedGaussians:det(Happrox) = DxxDyy (0.9Dxy)2(6)In[1] thetwoltersarecomparedindetail andtheresultsconcludethattheboxrepresentations negligible loss in accuracy is far outweighed by the considerable increaseineciencyandspeed. Thedeterminant hereis referredtoas theblobresponseatlocationx = (x, y, ). Thesearchforlocalmaximaofthisfunctionoverbothspaceandscaleyieldstheinterestpointsforanimage. Theexactmethodforextractinginterestpointsisexplainedinthefollowingsection.1.2.2 ConstructingtheScale-SpaceIn order to detect interest points using the determinant of Hessian it is rst necessary tointroduce the notion of a scale-space. A scale-space is a continuous function which can beusedtondextremaacrossallpossiblescales[8]. Incomputervisionthescale-spaceistypically implemented as an image pyramid where the input image is iteratively convolvedwith Gaussian kernel and repeatedly sub-sampled (reduced in size). This method is usedtogreateectinSIFT[4] butsinceeachlayerreliesontheprevious, andimagesneedtoberesizeditisnotcomputationallyecient. Astheprocessingtimeof thekernelsusedinSURFis sizeinvariant, thescale-spacecanbecreatedbyapplyingkernels ofincreasingsizetotheoriginal image. Thisallowsformultiplelayersof thescale-spacepyramidtobeprocessedsimultaneouslyandnegatestheneedtosubsampletheimagehenceprovidingperformanceincrease. Figure3illustrates thedierencebetweenthetraditionalscale-spacestructureandtheSURFcounterpart.Figure3: FilterPyramid. Thetraditionalapproachtoconstructingascale-space(left).TheimagesizeisvariedandtheGuassianlterisrepeatedlyappliedtosmoothsubse-quent layers. The SURF approach (right) leaves the original image unchanged and variesonlytheltersize.The scale-space is divided into a number of octaves, where an octave refers to a seriesofresponsemapsofcoveringadoublingofscale. InSURFthelowestlevelofthescale-space is obtained from the output of the 9 9 lters shown in 2. These lters correspondtoareal valuedGaussianwith=1.2. Subsequentlayersareobtainedbyupscalingthelterswhilstmaintainingthesamelterlayoutratio. Astheltersizeincreasessotoo does the value of the associated Gaussian scale, and since ratios of the layout remainconstantwecancalculatethisscalebytheformula:approx= CurrentFilterSize BaseFilterScaleBaseFilterSize= CurrentFilterSize 1.29Whenconstructinglargerlters, thereareanumberoffactorswhichmustbetakeintoconsideration. The increase in size is restricted by the length of the positive and negativelobesof theunderlyingsecondorderGaussianderivatives. Intheapproximatedltersthelobesizeissetatonethirdthesidelengthofthelterandreferstotheshortersidelength of the weighted black and white regions. Since we require the presence of a centralpixel,thedimensionsmustbeincreasedequallyaroundthislocationandhencethelobesize can increase by a minimum of 2. Since there are three lobes in each lter which mustbethesamesize,thesmalleststepsizebetweenconsecutiveltersis6. FortheDxxandDyyltersthelongersidelengthoftheweightedregionsincreasesby2oneachsidetopreserve structure. Figure 4 illustrates the structure of the lters as they increase in size.1.2.3 AccurateInterestPointLocalisationThe task of localising the scale and rotation invariant interest points in the image can bedividedintothreesteps. Firsttheresponsesarethresholdedsuchthatall valuesbelowFigure4: FilterStructure. Subsequentlterssizesmustdierbyaminimumof 6topreservelterstructure.thepredeterminedthresholdareremoved. Increasingthethresholdlowersthenumberofdetectedinterestpoints, leavingonlythestrongestwhiledecreasingallowsformanymoretodetected. Thereforethethresholdcanbeadaptedtotailorthedetectiontotheapplication.After thresholding, a non-maximal suppression is performed to nd a set of candidatepoints. To do this each pixel in the scale-space is compared to its 26 neighbours, comprisedof the 8 points in the native scale and the 9 in each of the scales above and below. Figure5illustratesthenon-maximal suppressionstep. Atthisstagewehaveasetof interestpoints with minimum strength determined by the threshold value and which are also localmaxima/minimainthescale-space.Figure5: Non-MaximalSuppression. ThepixelmarkedXisselectedasamaximaifitgreaterthanthesurroundingpixelsonitsintervalandintervalsaboveandbelow.Thenal stepinlocalisingthepointsinvolvesinterpolatingthenearbydatatondthelocationinbothspaceandscaletosub-pixelaccuracy. Thisisdonebyttinga3DquadraticasproposedbyBrown[3]. InordertodothisweexpressthedeterminantoftheHessianfunction, H(x, y, ), asaTaylorexpansionuptoquadratictermscenteredatdetectedlocation. Thisisexpressedas:H(x) = H +HxTx +12xT 2Hx2 x (7)The interpolatedlocationof the extremum, x=(x, y, ), is foundbytakingthederivativeofthisfunctionandsettingittozerosuchthat: x = 2Hx21Hx(8)Thederivativeshereareapproximatedbynitedierencesofneighbouringpixels. If xisgreaterthan0.5inthex, yordirectionsweadjustthelocationandperformtheinterpolationagain. Thisprocedureisrepeateduntil xislessthan0.5inall directionsor the the number of predetermined interpolation steps has been exceeded. Those pointswhichdonotconvergearedroppedfromthesetofinterestpointsleavingonlythemoststableandrepeatable.1.3 InterestPointDescriptorTheSURFdescriptordescribeshowthepixel intensitiesaredistributedwithinascaledependentneighbourhoodofeachinterestpointdetectedbytheFast-Hessian. Thisap-proachissimilartothatofSIFT[4]butintegralimagesusedinconjunctionwithltersknownasHaarwaveletsareusedinordertoincreaserobustnessanddecreasecomputa-tiontime. Haarwaveletsaresimplelterswhichcanbeusedtondgradientsinthexandydirections.Figure 6: Haar Wavelets. The left lter computes the response in the x-direction and the rightthe y-direction. Weights are 1 for black regions and -1 for the white. When used with integralimages each wavelet requires just six operations to compute.Extractionof thedescriptor canbedividedintotwodistinct tasks. First eachin-terestpointisassignedareproducibleorientationbeforeascaledependentwindowisconstructedinwhicha64-dimensional vectorisextracted. Itisimportantthatall cal-culationsforthedescriptorarebasedonmeasurementsrelativetothedetectedscaleinorder toachievescaleinvariant results. Theprocedurefor extracingthedescriptor isexplainedfurtherinthefollowing.1.3.1 OrientationAssignmentIn order to achieve invariance to image rotation each detected interest point is assigned areproducibleorientation. Extractionofthedescriptorcomponentsisperformedrelativetothisdirectionsoitisimportantthatthisdirectionisfoundtoberepeatableundervaryingconditions. Todeterminetheorientation,Haarwaveletresponsesofsize4arecalculatedforasetpixelswithinaradiusof6ofthedetectedpoint,wherereferstothescaleatwhichthepointwasdetected. Thespecicsetof pixelsisdeterminedbysamplingthosefromwithinthecircleusingastepsizeof.The responses are weighted with a Gaussian centered at the interest point. In keepingwiththeresttheGaussianisdependentonthescaleof thepointandchosentohavestandard deviation 2.5. Once weighted the responses are represented as points in vectorspace, withthex-responsesalongtheabscissaandthey-responsesalongtheordinate.Thedomimantorientationisselectedbyrotatingacirclesegmentcoveringanangleof3aroundtheorigin. At eachposition, thexandy-responseswithinthesegment aresummedandusedtoformanewvector. Thelongest vector lendsits orientationtheinterestpoint. ThisprocessisillustratedinFigure7.Figure7: OrientationAssignment: Asthewindowslidesaroundtheoriginthecomponentsof the responses are summed to yield the vectors shown here in blue. The largest such vectordetermines the dominant orientation.Insomeapplications,rotationinvarianceinnotrequiredsothisstepcanbeomitted,hence providingfurther performance increase. In[1] this versionof the descriptor isreeredtoasUprightSURF(orU-SURF)andhasbeenshowntomaintainrobustnessforimagerotationsofupto+/ 15 deg.1.3.2 DescriptorComponentsThe rst step in extracting the SURF descriptor is to construct a square window aroundthe interest point. This window contains the pixels which will form entries in the descrip-tor vector and is of size 20, again where refers to the detected scale. Furthermore thewindowisorientedalongthedirectionfoundinSection1.3.1suchthatall subsequentcalculationsarerelativetothisdirection.Figure8: DescriptorWindows. Thewindowsizeis20timesthescaleofthedetectedpointand is oriented along the dominant direction shown in green.Thedescriptorwindowisdividedinto4 4regularsubregions. WithineachofthesesubregionsHaar waveletsof size2arecalculatedfor 25regularlydistributedsamplepoints. Ifwerefertothexandywaveletresponsesbydxanddyrespectivelythenforthese25samplepoints(i.e. eachsubregion)wecollect,vsubregion=_

dx,

dy,

|dx|,

|dy|_. (9)Thereforeeachsubregioncontributesfourvaluestothedescriptorvectorleadingtoanoverallvectoroflength4 4 4 = 64. TheresultingSURFdescriptorisinvarianttorotation,scale,brightnessand,afterreductiontounitlength,contrast.Figure 9: Descriptor Components. The green square bounds one of the 16 subregions and bluecircles represent the sample points at which we compute the wavelet responses. As illustratedthe x and y responses are calculated relative to the dominant orientation.1.4 DesignThissectionoutlinesthedesignchoiceswhichhavebeenmadetoimplementtheSURFpointcorrespondencelibrary.1.4.1 LanguageandEnvironmentC++ has been chosen as the programming language to develop the SURF library for thefollowingreasons:1. Speed: Lowlevel imageprocessingneedstobefast andC++will facilitatetheimplementationofahighlyecientlibraryoffunctions.2. Usability: Inmyresearch, almostall imageprocessingappearstobecarriedoutinC++,CandMatlab. Oncecompletethelibraryistobefullydocumentedandfreelyavailable,soC++seemstheobviouschoiceinmakingausefulcontributiontotheeld.3. Portability: While C++ may not be entirely portable across platforms it is possible,by following strict standards, to write code which is portable across many platformsandcompilers.4. ImageProcessingLibraries: OpenCV1isalibraryof C++functionswhichlendsitselfwell toreal timecomputervision. Itprovidesfunctionalityforreadingdatafrom image les, video les as well as live video feeds direct from a webcam or othervision device. The library is well supported and works on both Linux and Windows.Thechosendevelopment environment for theimplementationof thelibraryis Mi-crosoftVisual C++Express20082. VC++isapowerful IDE(IntegratedDevelopmentEnvironment)allowingforeasycodecreationandvisualprojectorganisation. OpenCValsointegrateswellwiththecompilerandisfullysupported. AsbothOpenCVandVi-sual C++arefree, thiswill allowthenishedSURFlibrarytobedistributedwithoutlicensingrestrictions.1.4.2 ArchitectureDesignThearchitecturedesignprovidesatop-downdecompositionofthelibraryintomodulesandclasses.1Open Source Computer Vision Library. Provides a simple API for working with images and videosin C++. Available from: http://opencvlibrary.sourceforge.net/2Free C++ IDE for Windows. Available from: www.microsoft.com/express/IntegralImageOverview: ModulecreatesandhandlesIntegralImagesInputs: AnImageProcesses: Createstheintegral imagerepresentationofsuppliedinputimage. Calcu-latespixelsumsoveruprightrectangularareas.Outputs: Theintegralimagerepresentation.Fast-HessianOverview: Findshessianbasedinterestpoints/regionsinagivenintegralimage.Inputs: Anintegralimagerepresentationofanimage.Processes: Builds determinant of Hessianresponse map. Performs anonmaximalsuppresiontolocaliseinterestpointsinascale-space. Interpolatesdetectedpointstosub-pixelaccuracy.Outputs: Avectorofaccuratelylocalisedinterestpoints.SURFDescriptor:Overview: Extractsdescriptorcomponentsforagivensetofdetectedinterestpoints.Inputs: Anintegralimagerepresentationofanimage,vectorofinterestpoints.Processes: CalculatesHaarwaveletresponses. Calculatesdominantorientationofaninterest point. Extracts 64-dimensional descriptor vector basedonsums of waveletresponses.Outputs: AvectorofSURFdescribedinterestpoints.InterestPoint:Overview: Storesdataassociatedwitheachindividualinterestpoint.Inputs: InterestPointdata.Processes: Accessor/MutatorMethodsfordata.Outputs: NoneUtilities:Overview: ModulecontainsallnonSURFspecicfunctions.1.4.3 DataStructuresThere are twoimportant non-standarddatastructures whichwill be includedintheimplementationoftheSURFlibrary. ThesearetheIplImagewhichisusedbyOpenCVtostoreimagedata, andtheuser denedIpoint whichwill storedataabout interestpoints. As the IplImage is supplied by an external library, it is not included in the designofthiswork. TheIpointdatatypeanditsinterfaceisdetailedinSection1.4.4.1.4.4 InterfaceDesignTheclass/functioninterfacesaredescribedforeachofthecomponentslistedinSection1.4.2, where the interface refers tothe public components whichcanbe calledfromexternalmodules.Listing1: IntegralImageInterface// ! Returns i nt e g r a l image r e pr e s e nt at i on of s uppl i e d imageIpl Image I nt e g r a l ( Ipl Image img ) ;// ! Cal c ul at e s sum of p i x e l i n t e n s i t i e s i n t he areaf l oat Area ( Ipl Image img , i nt x , i nt y , i nt width , i nt hei ght ) ;Listing2: Fast-HessianInterfaceFastHessi an {// ! Const ruct orFastHessi an ( Ipl Image img ,vector &i pt s, // Vector t o s t or e I poi nt si nt octaves, // Number of Oct aves t o anal ys ei nt i nt e r va l s, // Number of I nt e r v a l s per Octavei nt sample , // Sampl ing s t ep i n t he imagef l oat t hr e s ) ; // Hessi an response t hr e s hol d// ! Find t he image f e at ur e s and wr i t e i nt o vect or of f e at ur e svoid ge t I poi nt s ( ) ;};Listing3: SURFDescriptorInterfaceSur f {// ! Const ruct orSur f ( Ipl Image i nt e gr al i mg ) ;// ! Set t he I poi nt of i n t e r e s tvoid s e t I po i nt ( I poi nt i pt ) ;// ! Assi gn t he s uppl i e d I poi nt an or i e nt at i onvoid ge t Or i e nt at i on ( ) ;// ! Get t he de s c r i pt or vect or of t he provi ded I poi ntvoid ge t De s c r i pt or ( ) ;};Listing4: InterestPointInterfaceI poi nt {// ! Coordi nat es of t he de t e c t e d i nt e r e s t poi ntf l oat x , y ;// ! Det ect ed s c al ef l oat s c a l e ;// ! Or i ent at i on measured ant i c l oc k wi s e from+ve xax i sf l oat o r i e nt a t i o n ;// ! Vector of de s c r i pt or componentsf l oat de s c r i pt o r [ 6 4 ] ;};1.5 ImplementationBasedonthetheoreticalanalysisofSections1.2and1.3,andthedesigninSection1.4,theresultingimplementationof theSURFalgorithmispresented. ThissectionservestoprovidefurtherinsightintoSURFwhilstsimultaneouslymakingcleartheconceptswhich are not explicitly dened in [1]. For each component the methods which have beenusedaredemonstratedfromalgorithmicapproachwithexcerptsofcodeprovidedwherenecessary.1.5.1 IntegralImagesComputation of the Integral is a very simple process. We iterate over all rows and columnssuchthatthevalueintheintegralimageateachpointisthesumofpixelsaboveandtotheleft. Toimplementthisinanecientmannerwecalculateacumulativerowsumaswe move along each row. For the rst row this sum forms the value at each location in theresultingintegral image. Foreachsubsequentrowtheresultingvalueisthecumulativerowsumplusthevalueinthecellaboveintheintegralimage. InsimpliedC++codethisiswrittenas:Listing5: IntegralImageComputation// Loop over rows and c ol s of i nput imagefor ( i nt row=0; row