Download - Offline Testing Search Engine Results
![Page 1: Offline Testing Search Engine Results](https://reader031.vdocuments.net/reader031/viewer/2022021918/58a9d9481a28aba05b8b53bd/html5/thumbnails/1.jpg)
OfflineTestingSearchEngineResults
![Page 2: Offline Testing Search Engine Results](https://reader031.vdocuments.net/reader031/viewer/2022021918/58a9d9481a28aba05b8b53bd/html5/thumbnails/2.jpg)
theexperiment
transitionfromFASTtoSolr
FASTSolr
myclientwastransitioning itssitesearchengine fromFASTtoSOLRandwantedthenewsearchengine toreturnmatchsearchresults.Thisisaunlikemostexperimentswhichinvolvesomeformofoptimization.
2
![Page 3: Offline Testing Search Engine Results](https://reader031.vdocuments.net/reader031/viewer/2022021918/58a9d9481a28aba05b8b53bd/html5/thumbnails/3.jpg)
testingmethodology
offlinetestingdoesn’thappeninisolationandisoftenaprecursortoABtesting
ABtestinglargescale
quantitativeusertesting
usertestingsmallscale
qualitativetesting
offlinetestingfocusondifferencesin
searchresults
performancetestingfocusonspeedandqueriespersecond
regressiontestingisanythingbroken?
3
![Page 4: Offline Testing Search Engine Results](https://reader031.vdocuments.net/reader031/viewer/2022021918/58a9d9481a28aba05b8b53bd/html5/thumbnails/4.jpg)
technologystack
FASTRealuserqueries results
SOLR
analysis
weran‘000sofrealuserqueriesagainsteachsearchengineandsavedtheresultsforanalysis
4
![Page 5: Offline Testing Search Engine Results](https://reader031.vdocuments.net/reader031/viewer/2022021918/58a9d9481a28aba05b8b53bd/html5/thumbnails/5.jpg)
offlinetestingmetrics
yourchoiceofmetricswilldependonyourgoalsandavailabilityofinformation
• differencesinresultscounts useeitherthemeanabsolutedifferenceinresults,alsoknownasmeanabsoluteerror(MAE),orrootmeansquarederror(RMSE)tomeasuredifferencesincounts.Itishelpfultoexpressthismetricinrelativeterms,asapercentageoftheaveragenumberofresultsreturnedbytheexistingsearchengine.
• howmanyresultsoverlap:usetheJaccard indexortheSørensen-Diceindextomeasuresimilarityacrosssetsofresults.
• rankcorrelation:useSpearman’srankcorrelationcoefficient(Spearman’srho)tomeasurecorrelationacrossoverlappingresults.
5
![Page 6: Offline Testing Search Engine Results](https://reader031.vdocuments.net/reader031/viewer/2022021918/58a9d9481a28aba05b8b53bd/html5/thumbnails/6.jpg)
offlinetestingmetrics
• precisionandrecall: precisionandrecallareoftenusedthemeasurethequalityofinformationretrievalsystems.Thesemetricsimplicitlyassumethatthecurrentsetofresultsisa“goldstandard”,andarebestsuitedwhenresultsaresortedbyrelevance.
• clickmetrics:ifhavehavearecordonuserinteractionswithyourexistingsetofsearchresults,youcouldforecastclick-metricsforthenewsearchengine.Commonmetricsincludetheaveragenumberofclicksperquery,aswellastheaverageormeanclickrank.
6
![Page 7: Offline Testing Search Engine Results](https://reader031.vdocuments.net/reader031/viewer/2022021918/58a9d9481a28aba05b8b53bd/html5/thumbnails/7.jpg)
analysisframework
#FASTresults
#Solr results
Solr =FAST
everyrepresentsaquery
Ourclientsfind thatascatterplotisahelpfulwaytovisualizedifferenceincounts.
Inanidealworld,everyquerywouldlieonastraight-line passingthrough theorigin (indicativeofaperfectmatchincountsbetweentheoldandthenewengine).
However,bugsanddifferencesinindexationcanforcepointsawayfromthatlineandontoeithertheXortheYaxis.
7
![Page 8: Offline Testing Search Engine Results](https://reader031.vdocuments.net/reader031/viewer/2022021918/58a9d9481a28aba05b8b53bd/html5/thumbnails/8.jpg)
differencesincountsAcrossqueryi
Fi
Si
Acrossallqueries
RMSE =Fi − Si( )2
i=1
n
∑n
TheRootMeanSquaredErrormeasurestheaveragedifferenceinthenumberofsearchresultsfound.
TheCoefficientofVariationexpressestheRMSEinrelativeterms:
CV =RMSE
Avg. #FAST results
FASTresultsABCDEFGHIJ
SOLRresultsBWCXDYZ
Fi isthenumberofFASTresultsforqueryiSi isthenumberofSOLRresultsforqueryi
8
![Page 9: Offline Testing Search Engine Results](https://reader031.vdocuments.net/reader031/viewer/2022021918/58a9d9481a28aba05b8b53bd/html5/thumbnails/9.jpg)
overlap
FASTresultsABCDEFGHIJ
SOLRresultsBWCXDYZ
Acrossqueryi Acrossallqueries
YoucouldusetheSørensen-Diceindextomeasurethesimilarityofsetsforeachquery.Itisboundedbetween0and1(1isdesirableand
indicativeofperfectoverlap).
Simi (Fi,Si ) =2 Fi Si∩Fi + Si
Fi isthesetofFASTresultsforqueryi atranknSi isthesetofSOLRresultsforqueryi atrankn
9
![Page 10: Offline Testing Search Engine Results](https://reader031.vdocuments.net/reader031/viewer/2022021918/58a9d9481a28aba05b8b53bd/html5/thumbnails/10.jpg)
rankcorrelation
FASTresultsABCDEFGHIJ
SOLRresultsBWCXDYZ
Acrossqueryi Acrossallqueries
YoucoulduseSpearman’s ranktocalculatecorrelationsacross
overlapping results.Thismetricisbounded between-1and1(1isdesirableandindicativeofperfect
positivecorrelation).
dj isthedifferenceinranks forthejthresultandiscalculatedas:FASTrankj - SOLRrankj
n isthenumberofoverlappingresultsforqueryi
ρi =1−6 Σdj
2
n (n2 −1)
10
![Page 11: Offline Testing Search Engine Results](https://reader031.vdocuments.net/reader031/viewer/2022021918/58a9d9481a28aba05b8b53bd/html5/thumbnails/11.jpg)
differencesincountsbefore
Quadrant QueryCount
QueryShare (%) RMSE CV(%)
FAST >0SOLR>0 7,049 82% 1,749,463 8,588%
FAST>0SOLR= 0 388 5% 1,718 403%
FAST =0SOLR>0 107 1% 7,078,404 NA
FAST =0SOLR=0 1,037 12% 0 NA
Overall 8,581 100% 1,771,711 10,575%
Querytype QueryCount
QueryShare (%) RMSE CV(%)
Wildcard queries 690 8% 99,894 298%
Loose phrasequeries 925 11% 15,905 188%
Plural-form queries 1,090 13% 13,845 207%
Automated insightsbyquery type. Notethataquerymaybeassociatedwithmorethanonetype.
Differencesincounts aredrivenbyahandfulofquerieswith0orfewresultsinFASTandmillionsinSOLR
11
![Page 12: Offline Testing Search Engine Results](https://reader031.vdocuments.net/reader031/viewer/2022021918/58a9d9481a28aba05b8b53bd/html5/thumbnails/12.jpg)
differencesincountsafter
Quadrant QueryCount
QueryShare (%) RMSE CV(%)
FAST >0SOLR>0 5,166 61% 213 2%
FAST>0SOLR= 0 125 1% 4,332 1,023%
FAST =0SOLR>0 20 0% 131 NA
FAST =0SOLR=0 3,169 37% 0 NA
Overall 8,480 100% 418 28%
Querytype QueryCount
QueryShare (%) RMSE CV(%)
Wildcard queries 665 8% 159 4%
Loose phrasequeries 910 11% 137 19%
Plural-form queries 2,356 28% 1,009 69%
Afterseveraliterations,differencesincountsaredownto2%onaverage acrossqueriesthatreturnresultsinFASTandSOLR.However,thereisstillmoreworktobedone.
12
![Page 13: Offline Testing Search Engine Results](https://reader031.vdocuments.net/reader031/viewer/2022021918/58a9d9481a28aba05b8b53bd/html5/thumbnails/13.jpg)
overlapbefore
ResultCounts *
ResultOverlap
JaccardIndex
Sorensen-Dice Index
5+ 1 0.21 0.26
10+ 3 0.23 0.30
20+ 7 0.26 0.34
*Measuredacrossthefirst5,10and20resultsforqueriesthatreturned5+,10+and20+resultsrespectively.
Onaverage,just7ofthefirst20results(and1ofthefirst5results)overlappedatthestartoftheprocess.
13
![Page 14: Offline Testing Search Engine Results](https://reader031.vdocuments.net/reader031/viewer/2022021918/58a9d9481a28aba05b8b53bd/html5/thumbnails/14.jpg)
overlapafter
ResultCounts *
ResultOverlap
JaccardIndex
Sorensen-DiceIndex
5+ 5 0.89 0.92
10+ 9 0.90 0.93
20+ 19 0.91 0.94
Afterseveraliterations,overlaphasrisento19resultsonpage1,up from7atthestartoftheprocess.Crucially,5ofthefirst5resultsoverlaponaverage.
*Measuredacrossthefirst5,10and20resultsforqueriesthatreturned5+,10+and20+resultsrespectively.
14
![Page 15: Offline Testing Search Engine Results](https://reader031.vdocuments.net/reader031/viewer/2022021918/58a9d9481a28aba05b8b53bd/html5/thumbnails/15.jpg)
rankcorrelationbefore
ResultCounts *
Spearman’sRank(datedesc.)
5+ 0.97
10+ 0.96
20+ 0.96
*Measuredacrossthefirst5,10and20resultsforqueriesthatreturned5+,10+and20+resultsrespectively.
Rankcorrelationwasparticularlystrongfromtheoutsetat0.96/1asalmostallsearchesaresortedbydateratherthanbyrelevance.
15
![Page 16: Offline Testing Search Engine Results](https://reader031.vdocuments.net/reader031/viewer/2022021918/58a9d9481a28aba05b8b53bd/html5/thumbnails/16.jpg)
rankcorrelationafter
ResultCounts *
Spearman’sRank(datedesc.)
5+ 0.98
10+ 0.98
20+ 0.99
*Measuredacrossthefirst5,10and20resultsforqueriesthatreturned5+,10+and20+resultsrespectively.
Afterseveraliterations,rankcorrelationacrossoverlapping resultshasimprovedfurtherto0.99/1.
16
![Page 17: Offline Testing Search Engine Results](https://reader031.vdocuments.net/reader031/viewer/2022021918/58a9d9481a28aba05b8b53bd/html5/thumbnails/17.jpg)
Thispresentationillustratedhowyoucouldofflinetestsearchengineresults.However,as
everyimplementationisunique,pleasecontactustodiscussyourneeds.