end-to-end neural ad-hoc ranking with kernel poolingzhuyund/papers/end-end-neural-slide.pdf ·...

End-to-EndNeuralAd-hocRankingwithKernelPooling

Chenyan Xiong1，ZhuyunDai1,JamieCallan1,Zhiyuan Liu2,andRussellPower3

1 :Language Technologies Institute, Carnegie Mellon University2:Tsinghua University

3:Semantic Scholar, AllenInstituteforAI

©2017ZhuyunDai

MotivationSoftMatch• SoftmatchbetweenqueriesanddocumentshasdrawnmuchattentioninIR

• Bridgesthevocabularygap:`you-know-who’`Voldemort’

ARecentSuccess:TheDeepRelevanceMatchingModel(DRMM)• Word2vec formatchingquerytermstodocumentterms• HistogramPoolingto`count’ matchesofdifferentqualities

Arethecurrentwordembeddings therightchoiceforIR?• sim(hotel, motel)=0.9, sim(Tokyo, London)=0.9• Query: `Tokyohotels’• Documents:`Tokyo motels’ `Hotel in London’

1©2017ZhuyunDai☺ ☹

KeyIdeasinOurWork• BuildontheideasintheDeepRelevanceMatchingModel(DRMM)• Kernel based Neural Ranking Model (K-NRM)• Learnaword-similaritymetrictailoredformatchingqueryanddocumentinad-hocranking.

End-to-endlearningawordembeddingsupervisedby

relevancesignals.

withanewkernelpoolinglayer

thatenforcesmulti-level

softmatchpatterns

2©2017ZhuyunDai

…

300

…

EmbeddingLayer

…

…

…

…

TranslationLayer

Translation MatrixM!×#

…

Soft-TF RankingFeatures

KernelPooling

Learning-To-Rank

W , b

KernelsQuery(! words)

Document(# words)

012032

0(2

014034

054

064

FinalRankingScore

…

…

…

012

0(2

…

…

…7�

�

300

300

300300300

300

Embedding-BasedTranslationModelQuery-wordtodocument-wordtranslationmodel.• Embedsawordintoacontinuousvector• Cosinesimilaritiesastranslationscores

3©2017ZhuyunDai

K-NRMModel:

…

300

…

EmbeddingLayer

…

…

…

…

TranslationLayer


…


KernelPooling

Learning-To-Rank

W , b


Document(# words)

012032

0(2

014034

054

064

FinalRankingScore

…

…

…

012

0(2

…

…

…7�

�

300

300

300300300

300

KernelPooling• Howmanyscoresaresoftlyin[1,0.8]?in [0.8, 0.6]..?• [1, 0.8]• RBFKernel𝜇 = 0.9, 𝜎 = 0.1• Soft-TF:softlycountingsoft-matchtermfrequencies

• 𝐾+ 𝑀- = ∑ exp(−456789:;9<

)>?@A

• 𝐾 𝑀- = {𝐾A 𝑀- , … , 𝐾D(𝑀-)}• Ksoft-TFfeaturesforeachquery-term.

KernelPooling(1)

Soft- TF

Word Pair Similarity

K1 K2 K3

Similarityof𝑞Ato{𝑑A, … , 𝑑>}

4©2017ZhuyunDai

K-NRMModel:

…

300

…

EmbeddingLayer

…

…

…

…

TranslationLayer


…


KernelPooling

Learning-To-Rank

W , b


Document(# words)

012032

0(2

014034

054

064

FinalRankingScore

…

…

…

012

0(2

…

…

…7�

�

300

300

300300300

300

KernelPooling(2)Aggregate per-querySoft-TFfeatures withlog-sum.

Output:𝜙(𝑀),Krankingfeatures for aquerydocument pair

5©2017ZhuyunDai

K-NRMModel:

…

300

…

EmbeddingLayer

…

…

…

…

TranslationLayer


…


KernelPooling

Learning-To-Rank

W , b


Document(# words)

012032

0(2

014034

054

064

FinalRankingScore

…

…

…

012

0(2

…

…

…7�

�

300

300

300300300

300

Learning-To-Rank

Learning-To-Rank

Combine rankingfeatures into theranking score• 𝑓 𝑞, 𝑑 =tanh(𝑤 O 𝜙 𝑀 + 𝑏)

Trainedinapairwisemanner.

6©2017ZhuyunDai

K-NRMModel:

• Asampleofonlinesearchlogfrom Sogou, a major Chinesesearch engine

Dataset

NoOverlap

7©2017ZhuyunDai

ExperimentalMethodology:

Wetrain and test ourmodelusinguserclicks(Manuallabelsnotavailable)Testing Scenarios:

DCTR:DocumentClickThroughRatemodel[A.Chuklin etal.2015]TACM:Time-AwareClickModel[Y.Liu etal.2016]RAWClick:Onlytheclickeddocisrelevant(singleclicksessions)

TrainingandTestingLabels

Testing-SAME

Train:DCTR

Test:DCTR

Testing-DIFF

Train:DCTR

Test:TACM

Testing-Raw

Train:DCTR

Test:RawClick

8©2017ZhuyunDai


K-NRMModel&BaselinesK-NRM- Embedding• 300-d,initializedwithword2vectrainedontitlesK-RNM- 11Kernels• 1exact-matchkernel:𝜇 = 1, 𝜎 = 0.0001• 10soft-matchkernels• equallydistributedin[-1,1](cosinesimilarityvaluerange)• 𝜇 = −0.9, −0.7, … , 0.7, 0.9, 𝜎 = 0.1

9©2017ZhuyunDai

Baselines1. Unsupervisedword-basedretrieval:Lm,BM252. Word-basedLeToR:RankSVM,Coor-Ascent.20IR-fusionfeatures.3. RecentNeuralIRmethods:Trans,DRMM,CDSSM


• Testandtrainusingthesameclickmodel• Easiertask

OverallPerformance:

Testing-SAME

Train:DCTR

Test:DCTR

10©2017ZhuyunDai

Testing-SAME

• Testwithamoreaccurateclickmodel• Hardertask

OverallPerformance:

Testing-DIFF

Train:DCTR

Test:TACM

11©2017ZhuyunDai

Testing-DIFF

• Onlythe(only)clickeddocumentisrelevant• Themostdifficulttask• Theclickeddocument: 1 position higher• K-NRMfittheunderlyingrelevancesignalrather than the training click model

OverallPerformance:

Testing-RAW

Train:DCTR

Test:RawClick

Rank=1/MRR=4.1

Rank=1/MRR=3.0

12©2017ZhuyunDai

Testing-RAW

Embeddings withdifferentlevelsofIR-specialization0. exact-match: K-NRM using only exact-match kernel.1. Word2vec: pre-trained on surrounding text2. Click2vec: pre-trained with (query-term, clicked-doc-term)3. Full model:trainedend-to-end,pairwiseuserpreference(3) >(2) >(1) > (0):moreIR-specialized embeddings aremore effective.

*Testing-SAMEandTesting-DIFFhadsimilarperformance.

13©2017ZhuyunDai

IR-SpecializedSoft-MatchSourcesofEffectiveness#1:

Threedifferentpoolingmethods.Allend-to-end1. Max-pool: use the best-match2. Mean-pool: allsoft-matchscoresaremixedtogether3. Full model:kernel poolingenforcesmulti-level soft match

(3) >(2) >(1):multi-levelsoft-matchprovidesmoreinformation

*Testing-SAMEandTesting-DIFFhadsimilarperformance.

14©2017ZhuyunDai

Multi-LevelSoft-MatchSourcesofEffectiveness#2:

Recallourmotivation:“Content-basedwordembeddingmaynotfitad-hocsearch”

• Thisprovestrue:58% of word2vec wordpairsweremovedacrosskernels byK-NRM.

• HowdoesK-NRMmovewordpairs?

15©2017ZhuyunDai

EffectsonWordEmbeddings

WordPairMovements1. Wordpairsdecoupled.• consideredrelatedby

word2vec,butnotbyK-NRM• >90%aredecoupled!

2. New soft match discovered.• lessfrequentlyappearinthe

samesurroundingcontext,butconveysimilarsearchintent

3. Matchingstrengthlevel• Strong<->Weak

(wife,husband),(son,daughter),(China-Unicom,China-Mobile),(Maserati,car),(website,homepage)

Decouple

(MH370,search),(pdf,reader),(BMW,contact-us),(192.168.0.1,router),

Newsoftmatch

(MH370,truth),(cloud,share)(oppor9,OPPOR),(10086,www.10086.com)

ChangeLevels Weak->Strong

Strong->Weak

16©2017ZhuyunDai

EffectsonWordEmbeddings:

• Embeddings trainedforsearcharemoreeffectivethanembeddings trainedfromsurroundingtext• Embeddings andsoft-matchbinsmustbetunedtogether.• Weproposeanewkernelpoolingtechnique:

• Allowsend-to-endtrainingofthewordembedding• Guidestheembeddingtoformeffectivemulti-levelsoft-matchpatternstailoredforad-hocranking

• Deliversrobustsoftmatchbetweenqueryanddocuments• Moved58%ofword2vecwordpairsacrosskernels• Decouples>90%ofwordpairsthatwereconsideredrelatedbyword2vec• Discoversnewsoftmatchpatternsofdifferenttypes

Conclusion

17©2017ZhuyunDai

…

300

…

EmbeddingLayer

…

…

…

…

TranslationLayer


…


KernelPooling

Learning-To-Rank

W , b


Document(# words)

012032

0(2

014034

054

064

FinalRankingScore

…

…

…

012

0(2

…

…

…7�

�

300

300

300300300

300

KernelPooling:revisitHistogramPoolingDRMM:HistogramPooling• Howmanywordpairs’similarityscorearein[1,0.8],[0.8,0.6]...?

• 𝐵+ 𝑀- = ∑ I{𝑀-?>?@A in bink}

• 𝐵 𝑀- = 𝐵A 𝑀- , … , 𝐵D 𝑀-

25©2017ZhuyunDai

K-NRMModel: