leveraging procedural knowledge for task …ziy/slides/yang-task-oriented-search.pdfleveraging...

Post on 30-Mar-2018

233 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

LeveragingProceduralKnowledgeforTask-OrientedSearch

ZiYang,EricNyberg

LanguageTechnologiesInstituteSchoolofComputerScienceCarnegieMellonUniversity

{ziy,ehn}@cs.cmu.edu

Outline

• Background• ProblemDefinition• ProposedApproach• Experiment• Conclusion

2

• Decomposethetaskintorequiredsubtasksmanually• Formulatequeriesmanually

• Entity-centricsearch– Seekforattribute,feature,relatedentity,action,etc.

• Task-orientedsearch– Solutionseekinganddecisionsupport.

Entity-centricvs.Task-orientedSearch

organizeaconference

chooseahotel

comparebanquetoption

recruitvolunteers

contactthe publisher considerthenumber andsize ofconference rooms

arrangemealcatering andmenu plan

checkfordiscounted rate

Howdosearchersaccomplishtasksusinginteractivesearch?

3

HowdoSearchEnginesAssistSearchers?

• QuerysuggestionasanexampleEntity-centricsearch

Suggestattribute,feature,relatedentity,action,etc.

KnowledgeofattributeandfeaturesDescriptiveknowledge

Descriptiveknowledgebase

Task-orientedsearch

Suggestrequiredsubtasks,actions,solutions,etc.

Knowledgeexercisedintheaccomplishmentofatask,i.e.howtodothingsProceduralknowledge

ExistingsolutionsProblemstudiedinthiswork

Proceduralknowledgebase4

ThinkReversely!

• Canwelearnproceduralknowledgefromusers’searchactivitiesand/orquerysuggestions,andbuildaPKBautomatically?

Task-orientedsearch

Suggestrequiredsubtasks,actions,solutions,etc.

Knowledgeofexercisedintheaccomplishmentofatask,i.e.howtodothingsProceduralknowledge

Problemalsostudiedinthiswork

AutomaticallybuiltPKBProceduralknowledgebase

5

RelatedWork

• Searchintent&task-orientedsearch– Complexsearchtaskassistantfromquerylog[Hassanetal.2012,2014]

– Task-orientedquestionsandhow-toWebqueries[Weber2012]

– IMine,SubtaskMining@NCTIR[Liu2014]• Proceduralknowledgeacquisition– Ontologiesproposedforstructuredrepresentationofproceduralknowledge[Fukazawa2010,Pareti 2014]

– Extractionbasedonstructuralinformation[Jung2010],definitionofrulesortemplates[Addis2009]

– Terminology:goal vs.target vs.purpose, instruction vs.actionsequence,step vs.action,etc.

6

Outline

• Background• ProblemDefinition– Terminology– Problem1:SearchTaskSuggestion(STS)– Problem2:AutomaticProceduralKnowledgeBaseConstruction(APKBC)

– STSandAPKBC

• ProposedApproach• Experiment• Conclusion

7

Proceduralknowledgegraph/base(PKB)

Terminology

How to Clean a Birdbath

How to Fix a Leaky Faucet

Ashortandconcisesummary

Adetailedexplanation

Atask

Is-achieved-byrelationbetweenaparenttaskand

alistofsubtasks• Numbered“Steps”• Bulletedsubsteps• Outgoing freelinks

8

Problem1:SearchTaskSuggestion(STS)• Whenusersturntosearchenginesforinformationseeking

andproblemsolving,howtoleverageexistingproceduralknowledgetosuggestsubsearchtask(i.e.query)?

SearchTaskSuggestion:GivenaproceduralknowledgegraphGandatask-orientedsearchq,weaimto

Task-orientedsearch Proceduralknowledgebase

searchtaskq taskt

1(a)identify thetaskfromT theuserintendstoaccomplish

taskss1,…,sn

1(b) retrievealistofn sub tasks

searchtasksp1,…,pk 1(c)suggestthe

corresponding subsearchtask9

AutomaticProceduralKnowledgeBaseConstruction:Givenataskt,weaimto

Task-orientedsearch Proceduralknowledgebase

Problem2:AutomaticProceduralKnowledgeBaseConstruction(APKBC)

tasktsearchtaskq2(a)identifyasearchtask

taskss1,…,sn

2(c)identifyn (≤k)searchtaskstogeneraten tasksthatcanbeperformed toaccomplishthetaskt withtextdescription.

searchtasksp1,…,pk

2(b)collectkrelatedsearchtasks

• Usersstillfaceadhoc situations(tasks)thatarenotcoveredbyanexistingPKB,butothersearchersmayhaveinteractedwithsearchenginestoattemptasolution.

• CanweconstructaPKBusingsearchqueriesandrelevantdocumentsreturnedfromsearchengines?

10

Outline

• Background• ProblemDefinition• ProposedApproach– BasicIdea– Three-wayParallelCorpusConstruction– FeatureDefinitionandModelConstruction

• Experiment• Conclusion

11

Queryable Phrase/TaskDescriptionExtraction:BasicIdea

• Jointlearningfromavailableartifacts

ExistingPKBs• Can indicatehowto

accomplishtasks• Arenot optimizedfor

interactivesearch

Existingsearchlog• Can reveal howto

formulatequeries• Cannot coverhowto

searchforproceduralknowledge

ExistingWebdocuments• Can exemplifyhowto

describetasks• Donot focuson

procedure

Canwetaketheadvantageofalltheartifactsandlearnfromeachother?

Queryphraseextraction

Three-wayparallelcorpusconstruction

Taskdescriptionextraction12

Three-wayParallelCorpusConstruction

• Parallelcorpus:=asetofmatchingtriples

• Example:GrowTallerhttp://www.wikihow.com/Grow-Taller

⟨ aqueryq,ataskt,atextualcontextc⟩

13

Three-wayParallelCorpusConstruction(cont’d)

• Step1:Extractingseedtriplesfromsearchquerylog– Scanthroughtheentiresearchquerylogtofindeachqueryq

thatmatchesthedescriptionoftaskt.– Extractthetextualcontentfromthetoprelevantdocumentsto

retrievethecontextc.Taskdescriptions inPKBs(GrowTaller)• Ifyou’refromatallfamilyandyou’renot

growingbyyourmid-teens, orifyourheighthasn’tchangedmuchfrombeforepubertyorduringpuberty, thenit’s agoodideatoseeadoctor…

• Thehuman growthhormone (HGH)isproducednaturallyinourbodies, especiallyduringdeeporslowwavesleep.Gettinggood,sound sleepwillencouragetheproductionofHGH,whichiscreatedinthepituitarygland.

• …Therearetonsof“growtaller”exercisesontheInternet,whichclaimtohelpyougrow…

ContextsretrievedfromtheWeb• …Ifyou’refromatallfamily

andyou’renotgrowingbyyourmid-teens, orifyourheighthasn’t changedmuchfrombeforepubertytoduringpuberty, thenit’sagoodideatoseeadoctor.

• Thegrowthhormone (HGH)isproducednaturallyinthepituitaryglandduringdeeporslowwavesleep.

Searchqueriesinasession

growtaller

14

Exactmatchingisusedintheexperiment.

Three-wayParallelCorpusConstruction(cont’d)

• Step2(optional):ManuallycreatingsearchtasksfortasksinthePKB– Usethesummaryofthetaskt toformasearchqueryq and

issueitthesearchenginetoextractcontextc.– Excludethistripledueto“artificiality”!

Taskdescriptions inPKBs(GrowTaller)• Ifyou’refromatallfamilyandyou’renot

growingbyyourmid-teens, orifyourheighthasn’tchangedmuchfrombeforepubertyorduringpuberty, thenit’s agoodideatoseeadoctor…

• Thehuman growthhormone (HGH)isproducednaturallyinourbodies, especiallyduringdeeporslowwavesleep.Gettinggood,sound sleepwillencouragetheproductionofHGH,whichiscreatedinthepituitarygland.

• …Therearetonsof“growtaller”exercisesontheInternet,whichclaimtohelpyougrow…

ContextsretrievedfromtheWeb• …Ifyou’refromatallfamily

andyou’renotgrowingbyyourmid-teens, orifyourheighthasn’t changedmuchfrombeforepubertytoduringpuberty, thenit’sagoodideatoseeadoctor.

• Thegrowthhormone (HGH)isproducednaturallyinthepituitaryglandduringdeeporslowwavesleep.

Searchqueriesinasession

growtaller

15

Three-wayParallelCorpusConstruction(cont’d)

• Step3:Collectingrelatedqueries– Combinetheuser-issuedqueriesfromthesamesession(from

Step1)andthelistofqueriessuggestedbythesearchengine(fromSteps1and2).

Taskdescriptions inPKBs(GrowTaller)• Ifyou’refromatallfamilyandyou’renot

growingbyyourmid-teens, orifyourheighthasn’tchangedmuchfrombeforepubertyorduringpuberty, thenit’s agoodideatoseeadoctor…

• Thehuman growthhormone (HGH)isproducednaturallyinourbodies, especiallyduringdeeporslowwavesleep.Gettinggood,sound sleepwillencouragetheproductionofHGH,whichiscreatedinthepituitarygland.

• …Therearetonsof“growtaller”exercisesontheInternet,whichclaimtohelpyougrow…

ContextsretrievedfromtheWeb• …Ifyou’refromatallfamily

andyou’renotgrowingbyyourmid-teens, orifyourheighthasn’t changedmuchfrombeforepubertytoduringpuberty, thenit’sagoodideatoseeadoctor.

• Thegrowthhormone (HGH)isproducednaturallyinthepituitaryglandduringdeeporslowwavesleep.

Searchqueriesinasession

growtaller

humangrowthhormone

growtallerexercises

16

Three-wayParallelCorpusConstruction(cont’d)

• Step4:Expandingparallelcorpus– Foreachrelatedqueryp,findthesubtasks1,…,sn thatcontains

p initssummaryorexplanation,andretrieveitscontextd.– Discardunmatchedrelatedqueries ortaskdescriptions.

Taskdescriptions inPKBs(GrowTaller)• Ifyou’refromatallfamilyandyou’renot

growingbyyourmid-teens, orifyourheighthasn’tchangedmuchfrombeforepubertyorduringpuberty, thenit’s agoodideatoseeadoctor…

• Thehuman growthhormone (HGH)isproducednaturallyinourbodies, especiallyduringdeeporslowwavesleep.Gettinggood,sound sleepwillencouragetheproductionofHGH,whichiscreatedinthepituitarygland.

• …Therearetonsof“growtaller”exercisesontheInternet,whichclaimtohelpyougrow…

ContextsretrievedfromtheWeb• …Ifyou’refromatallfamily

andyou’renotgrowingbyyourmid-teens, orifyourheighthasn’t changedmuchfrombeforepubertytoduringpuberty, thenit’sagoodideatoseeadoctor.

• Thegrowthhormone (HGH)isproducednaturallyinthepituitaryglandduringdeeporslowwavesleep.

Searchqueriesinasession

growtaller

humangrowthhormone

growtallerexercises

17

Exactmatchingisusedintheexperiment.

Three-wayParallelCorpusConstruction(cont’d)

• Step5:AnnotatingBIO– Findthecontiguoussequenceofwordsfromthetaskt (context

c)thatismostrelevanttothequeryq (taskt’ssummaryorexplanation).

Taskdescriptions inPKBs(GrowTaller)• Ifyou’refromatallfamilyandyou’renot

growingbyyourmid-teens, orifyourheighthasn’tchangedmuchfrombeforepubertyorduringpuberty, thenit’s agoodideatoseeadoctor…

• Thehuman growthhormone (HGH)isproducednaturallyinourbodies, especiallyduringdeeporslowwavesleep.Gettinggood,sound sleepwillencouragetheproductionofHGH,whichiscreatedinthepituitarygland.

• …Therearetonsof“growtaller”exercisesontheInternet,whichclaimtohelpyougrow…

ContextsretrievedfromtheWeb• …Ifyou’refromatallfamily

andyou’renotgrowingbyyourmid-teens, orifyourheighthasn’t changedmuchfrombeforepubertytoduringpuberty, thenit’sagoodideatoseeadoctor.

• Thegrowthhormone (HGH)isproducednaturallyinthepituitaryglandduringdeeporslowwavesleep.

Searchqueriesinasession

growtaller

humangrowthhormone

growtallerexercises

18

BQ IQ

BQ IQ IQ

BTE ITE …Exactmatchingisusedforannotatingtask intheexperiment.

Selectedthesentencesfromcontext thatcontainallthetokens inthetask summaryand70%+ofthetokens inthetask explanation,andannotatedtheminimalspanthatcontainsthoseoverlappingtokens.

FeatureDefinition

• Featurelistforbothcontext andtask

19

Category Description/Motivation CountLocation(LOC): Appearsinthetask summaryandexplanation 2

“Skimmable information thatreaderscanquicklyunderstand”shouldbeprovidedinthetitleandthebeginningsentenceofeachstep.

Part ofspeech(POS) 36

Boththearticletitleandthefirstsentenceineachstepbeginwithaverbinbareinfinitiveform.

Parse(PAR)

Basic Stanforddependency types 50

Namedentity,nounphrase,verbphrase 3

Identify thetaskfacets(subsidiary resourcesorconstraints,etc.)

Word,context

Surface, stem,TF-IDFscore 3

Surface,stem,TF-IDFscore,POStagsofprevious/nextword 78

ModelConstruction

• Wordsequencelabelingforquery construction,tasksummaryandexplanationconstruction

20

Query construction Tasksummaryconstruction

Taskexplanationconstruction

Problem Wordsequence labelingproblems

Model MQ MTS MTE

Features The samefeatureset,exceptthatlocationisonlyusedforquery

Training set Features X t, labelsY t extractedfromtaskdescription

Features X c,labels Y c extractedfromcontext

Predictionobjective

yt*=argmax p (y t |x t ;M Q)y t ∈{BQ, IQ,O}|t |

yc*=argmax p(y c |x c ;MTS)y t ∈{BTS, ITS,O}|c |

yc*=argmax p(y c |x c ;MTE)y t ∈{BTE, ITE,O}|c |

Output yt *=O…OBQIQIQO…O yc *=O…OBTSITSITSO…OBTEITEITEO…O

Task-orientedsearch Proceduralknowledgebase

STSandAPKBC

tasktsearchtaskq2(a)identify searchtask

taskss1,…,sn

2(c)identifyandgeneratesubtaskssearchtasks

p1,…,pk

2(b)collectrelatedsearchtasks

1(a)identify task

1(c)suggestandcreatesubsearchtask

1(b) retrievesubtasks

Exactmatchingorretrievalbasedmethod

Needasearch intentmodeltoretrievetask-orientedsearchtasks(futurework)

RefertoPKBtoretrieverelatedsubtasks

Generatequeryable phrases/taskdescriptionsusinganalgorithmthatlearnshowsearchersformulatequeries/editorsdescribeproceduralknowledge

21

Outline

• Background• ProblemDefinition• ProposedApproach• Experiment– DataPreparation– ExperimentSettings– SearchTaskSuggestionResult– ProceduralKnowledgeBaseConstructionResult

• Conclusion

22

DataPreparation

• EnglishwikiHowdatadump• AOLsearchquerylog• Queriessuggestedbysearchengines• Contextextractedfromsearchengines

23

ExperimentSettings

• Sequencelabeling vs.end-to-end evaluation

Sequencelabelingevaluation End-to-end evaluation

Goldstandard

Automaticallylabeledparallelcorpus

Manualjudgment

Testset 10-foldcrossvalidation 50randomlysampledtriples

Evaluationmethods

Precision,Recall,F-1,averagedonalltestinstances(macro-averaged) andoneachtaskthenacrossalltasks(micro-averaged),F-1basedROUGE-2and-S4

Macro-averagedandmicro-averagedPrecision@8, MAP

Baselinemethods

CRF(proposed), HMM(surface),LR,SVM,featureablation

Google, Bing,wikiHow

Featureextractors,learners

StanfordCoreNLP:sentence,token, stem,POS,dependencyparse,chunk,namedentityMALLET:CRF,HMM;LibLinear:LR,SVM

24

SearchTaskSuggestionResult

• Queryconstructionresult– TheproposedCRF-basedapproachoutperformsother

classifiers*,esp.independentclassifiers(max.SVM).– Alsooutperformseachfeaturecategory**(max.W/WORD),

andLOUstudyns (max.W/OPOS).

.7471 .6930.8112 .8087

.6855 .6612.7922 .7892

.6803 .6175.7713 .7657.7466 .6870.8113 .8082

.0000

.2000

.4000

.6000

.8000

MacroF1 MicroF1 ROUGE-2 ROUGE-S4

CRF HMM SVM LR TFIDF

W/POS W/PAR W/LOC W/WORD W/OPOS

W/OPAR W/OLOC W/OWORD LOCAL CONTEXT25

SearchTaskSuggestionResult(cont’d)

PROPOSED GOOGLE BING

Task:slimdown

weightloss slimdowndiet the slimdownclub

heavyfood 7dayslimdown howtoslimdownfast

junkfood weightloss slimdownchallenge

keepupthemood slimdownthighs howtoslimdownlegs

Task:playredalert2

buildabarracks redalert 2complete(iso)original2disc

playredalert 2game

buildawarfactory playredalert2free playra2online

radarchould playredalert2onlinefree redalert2download

buildapowerplant/tesla reactor playredalert3 freeredalert3

• End-to-endexample– Slimdown– Playredalert2

26

SearchTaskSuggestionResult(cont’d)

• End-to-endevaluation– Proposedapproachistailoredfortask-orientedsearch.– Currentgeneral-purposecommercialsearchenginesare

designedforentity-centricsearch– Currentsearchenginestendtosuggestqueriesbyappending

keywordssuchasproduct,image,logo,online,free,etc..4457 .4457

.3361

.0972 .0973.0553.0333 .0313 .0120

.0676 .0612 .0549

.0000

.1000

.2000

.3000

.4000

.5000

MacroP MicroP MAP

PROPOSED GOOGLE BING LOG

27

AutomaticProceduralKnowledgeBaseConstructionResult

.4207.3455

.4463 .4392

.1175 .1119

.2425 .2301

.3556.3153

.3822 .3788.4129

.3198.4170 .4118

.0000

.1000

.2000

.3000

.4000

MacroF1 MicroF1 ROUGE-2 ROUGE-S4

CRF HMM SVM LR TFIDF

W/POS W/PAR W/WORD W/OPOS W/OPAR

W/OWORD LOCAL CONTEXT

• Tasksummarygenerationresult– Allscoresarelowerthaninthequeryconstructiontask.– CRF outperformsotherclassifiers*(max.SVM),eachfeature

categoryns (max.W/POS),andLOUstudyns (max.W/OWORD).

28

AutomaticProceduralKnowledgeBaseConstructionResult(cont’d)

• Taskexplanationgenerationresult– CRF outperformsotherclassifiers*(max.HMM,implyingthe

importanceofsurfaceformsandsequencelabelingnature).– Alsooutperformseachfeaturecategoryns (max.W/WORD).– LOUstudyshowsW/OPAR performsthebestintermsof

ROUGE..3853 .3577 .3698 .3686

.0000 .0050

.2450 .2324

.3639.3176 .3489 .3472

.3718.3468

.3804 .3793

.0000

.1000

.2000

.3000

.4000

MacroF1 MicroF1 ROUGE-2 ROUGE-S4

CRF HMM SVM LR TFIDF

W/POS W/PAR W/WORD W/OPOS W/OPAR

W/OWORD LOCAL CONTEXT29

• End-to-endexample– Searchenginewouldsuggest“signupforairbnb coupon”for

“signupforairbnb”,whichimpliesanimportantresourceforthetask.

Task:signupforairbnb

Airbnb isnolongerrunningthe$50 OFF$200promobutyoucanstillsave$25OFFYourFirstAirbnb Stayof$75ormorebycopyingandpastingthislink intoyourbrowser…

Task:makeblueberrybananabread

Pleasedon’tuse regularwholewheatinthisrecipe– theloafwillturnoutverydense

Addthe wetingredients– theeggmixturetotheflourmixtureandstirwitharubberspatulauntiljustcombined

Ifyou’reinneedofaquick, easyanddelicious waytouseuptheripebabanas inyourhouse…definitely

Task:becomeacellphonedealer

However, thecellphoneprovidermayplacerestrictionsonthemannerinwhichyoucanuseitscompanyname,phonebrandsandimages

Visit thestate’sbusiness licensingagency’swebsiteandyourcity’s occupational/business licensingdepartment’swebsitetodetermineifyouneedalicenseforyourprepaidcellphonebusiness

AutomaticProceduralKnowledgeBaseConstructionResult(cont’d)

30

AutomaticProceduralKnowledgeBaseConstructionResult(cont’d)

• End-to-endevaluation– Automaticapproachperformsworththanmanualcuration in

buildinganewPKBfromscratch.– Butstilldiscoverrelevantsubtasksthatarenotcoveredinthe

currentPKB,whichdeliversthefreshestinformationthatishardlyaddedandupdatedinstantlyinamanualprocess.

.0997 .0995 .0527.2046 .2041 .1331

.9677 .9515 .9404

.0000

.2000

.4000

.6000

.8000

1.0000

MacroP MicroP MAP

Proposed SummaryGeneration Proposed ExplanationGeneration wikiHow

31

Outline

• Background• ProblemDefinition• ProposedApproach• Experiment• Conclusion

32

Conclusion

• Investigatedtwoproblems– Searchtasksuggestionusingproceduralknowledge– Automaticproceduralknowledgebaseconstructionfromsearch

activities• Proposedtocreateathree-wayparallelcorpusofqueries,query

contexts,andtaskdescriptions.• AppliedCRF-basedsequencelabelingmodelsforquery

constructionandtaskdescriptiongeneration.• Futurework

– Userstudy– Jointranking– APKBCusinganaturallanguagegenerationapproach

33

Thanks!Questions?

http://github.com/ziy/pkb

Code&Resources

AnsweringTask-OrientedQuestionsfromtheWebWebQA Workshop,Thursday11am

RelatedWorkshopTalk

ZiYangLanguageTechnologiesInstituteSchoolofComputerScienceCarnegieMellonUniversityziy@cs.cmu.edu

Contact

TravelissponsoredbySIGIRStudentTravelGrant!

Acknowledgement

ParallelCorpusConstructionResult

• Relatedquery tosubtask mapping– Identified1,182query-taskpairsusingexactmatching.

• Task tocontextmapping– Selectedthesentences thatcontainallthetokensinthetask

summaryand70%+ofthetokensinthetask explanation.– Annotatedtheminimalspanthatcontainsthoseoverlapping

tokens.

35

HowDoSearchEnginesandUsersResponsetoTask-OrientedQueries?

• Thenumber(andpercentage)ofsuggestedqueries(orqueriesissuedinthesamesession)thatarementionedwithinthedescriptionofsomesubtask.– “NewWords”:E.g.slimdown->slimdowndiet– Lowqualitymaybeduetoanover-simplifiedsessiondetectionmethod

0

0.2

0.4

0.6

0.8

Fullphrase Newwords

Averagednumber

0246810

Fullphrase Newwords

Percentage(%)

Google

Bing

Log

36

SearchTaskSuggestion

Givenatask-orientedsearchtaskrepresentedbyqueryq(a)Identifytask

– RetrievealistofcandidatetasksfromPKBthatmentionthequeryq ineitherthesummaryorexplanation.

– Selectthetaskt thatmaximizesthelikelihoodofeachcandidateoccurrence,i.e.p(yt=BQIQ…IQ|xt;MQ).

(b)Retrievesubtasks– Retrieve the first-level subtasks s1, …, sn of task t.

(c)Suggestandcreatesubsearchtask– Extract query candidates for each subtask si usingMQ again.– Rankbyp(ysi=BQIQ…IQ|xsi;MQ).

37

AutomaticProceduralKnowledgeBaseConstruction

Givenataskt,(a)Identifysearchtask

– ApplyMQ toextractatask-orientedsearchqueryq.(b)Collectrelatedsearchtasks

– Identifythequeriespi relatedto q inbothsearchlogsandsuggestedqueries.

(c)Identifyandgeneratesubtasks– Extract relevantdocumentsnippets for each relatedquerypi

fromsearchengines.– ApplyMTS/Etoextracttask summaryandexplanation.

38

Searchenginesareabletocorrectlysuggestrelatedtaskstotheuser,ratherthanrelatedentitiesorattributes.

Searchlogsrevealhowaspecificuserworkstoaccomplishatask.

DataPreparation

• EnglishwikiHowdatadump– UsedamodifiedversionofWikiTeam tool.– Obtained149,975articlesthatarenon-redirect,innamespace

“0”,non-stub,with“Introduction”and“Steps”.– CreatedaPKBof1,488,587tasks,1,439,217relations.

• AOLsearchquerylog– 21M(10Munique)queriesintotal.– Afterdowncaseandremovenon-alphanumericcharacters,639

uniquequeriesmatch619tasksummariesafterwhitespaceandpunctuationmarksignored.

– Identified33,548relatedquerycandidatesbycollectingthequeriesthatwereissuedbythesameuserwithin30minutesafterissuedeachthematchingquery.

39

DataPreparation(cont’d)

• Queriessuggestedbysearchengines– Randomlysampled1,000non-primitivetasksfromPKBthatdo

notappearinthequerylog.– Collected9,906relatedqueriessuggestedbyGoogle(avg.6.11,

max.8)and9,715(avg.5.99,max.13)relatedqueriessuggestedbyBingforthe1,639queries.

• Contextextractedfromsearchengines– ExtractedURLsfromGoogle’sfirstsearchresultpageand

excludedwikihow.comdomain(forgeneralizability),google.comdomain,URLsthathavenosubpaths (navigationalsearchresults),anddownloaded7,440contextdocuments.

– UsedBoilerpipe toextract7,437documentsascontexts,andadditional3,512documentsforend-to-endevaluation.

40

SearchTaskSuggestionResult(cont’d)

• 5mostcontributingnon-wordfeatures– Queryphrasesaremorelikelyextractedfromthesummarypart

ofadescriptionduetoitsclarityandconciseness.– Singularnounsandverbsareindicatorstobeginaquery.– Verbphraseisusedtodecidewhethertocontinueaquery.

O à BQ BQ à IQ IQ à IQ

1 POS:NNP POS-1:VB LOC:sum

2 LOC: sum LOC:sum POS-1: IN

3 DEP:ccomp POS-1:VBP VP

4 POS: VB POS-1:NNP DEP:dobj

5 DEP:nsubjpass POS-1:NN POS+1:JJ

41

AutomaticProceduralKnowledgeBaseConstructionResult(cont’d)

• 5mostcontributingnon-wordfeatures– Nounsandverbsarecrucialforconstructiontaskdescription.– Verbsaremorepreferredtobeginthesummarythannouns.– Tobeginanexplanation,itprefersthe“begin” ofasentence

and/adependencylabelofnsubj.– Verbphrasesarealsoimportant.

Summary Explanation

O à BTS BTS à ITS ITS à ITS O à BTE BTE à ITE ITE à ITE

1 POS:VB POS-1:VB POS-1:VBP Begin VP POS-1:NN

2 POS: VBP POS-1:VBP POS:NNP POS:VBG POS-1:NN VP

3 POS:NN POS-1:NNP POS-1: IN POS:NN POS-1:DT POS-1:NNS

4 DEP: appos POS-1:NN DEP:xcomp DEP:compound

NP POS-1: ,

5 POS:NNP DEP:case POS: JJR DEP:nsubj POS-1:VB POS-1:NNP42

top related