exploiting time-based synonyms in searching document archives
Post on 31-Jul-2015
148 Views
Preview:
TRANSCRIPT
Outline
Exploiting Time-based Synonyms in SearchingDocument Archives
Nattiya Kanhabua and Kjetil Noslashrvaringg
Database System GroupNorwegian University of Science and Technology
Trondheim Norway
JCDLrsquo2010 June 21 - 25 Gold Coast Australia
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
Outline
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
Outline
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
Outline
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
Outline
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
Outline
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Problem statement
In recent years document archives are publicly availableEg Internet Archive digital libraries and news archives
Searching in such resources is not straightforwardContents in these resources are strongly time-dependent
Query ldquoPope Benedict XVIrdquo and dates ldquobefore 2005rdquoUnable to retrieve documents about ldquoJoseph AloisRatzingerrdquoTo improve the retrieval effectiveness query expansionusing synonyms wrt time can be employed
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Problem statement
In recent years document archives are publicly availableEg Internet Archive digital libraries and news archives
Searching in such resources is not straightforwardContents in these resources are strongly time-dependent
Query ldquoPope Benedict XVIrdquo and dates ldquobefore 2005rdquoUnable to retrieve documents about ldquoJoseph AloisRatzingerrdquoTo improve the retrieval effectiveness query expansionusing synonyms wrt time can be employed
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Problem statement
In recent years document archives are publicly availableEg Internet Archive digital libraries and news archives
Searching in such resources is not straightforwardContents in these resources are strongly time-dependent
Query ldquoPope Benedict XVIrdquo and dates ldquobefore 2005rdquoUnable to retrieve documents about ldquoJoseph AloisRatzingerrdquoTo improve the retrieval effectiveness query expansionusing synonyms wrt time can be employed
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Observation
Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]
Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift
Synonyms are different words with similar meanings
In our context synonyms are terms used as name variants(other names titles or roles) of a named entity
Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Observation
Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]
Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift
Synonyms are different words with similar meanings
In our context synonyms are terms used as name variants(other names titles or roles) of a named entity
Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Observation
Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]
Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift
Synonyms are different words with similar meanings
In our context synonyms are terms used as name variants(other names titles or roles) of a named entity
Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
What are time-based synonyms
Time-independent synonyms are invariant to time
Time-dependent synonyms are relevant to a particular timeperiod ie entity-synonym relationships change over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Scenario 1Query ldquoPope Benedict XVIrdquo and written before 2005Documents about ldquoJoseph Alois Ratzingerrdquo are relevant
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Scenario 2Query ldquoHillary R Clintonrdquo and written from 1997 to 2002Documents about ldquoNew York Senatorrdquo and ldquoFirst Ladyof the United Statesrdquo are relevant
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Challenge
Semantic gaps in searching archives or a lack of knowledgeabout a query and synonyms at particular time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Figure A snapshot of Wikipediaand current revisions at time tk
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Example[Bunescu and Pasca EACLrsquo2006]1) Multi-word titles and all words arecapitalized
President_of_the_United_StatesrArr named entity
2) Single-word titles with multiple capitalletters
UNICEF and WHO are namedentities
3) 75 of occurrences in the article textitself are capitalized
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Example
ei President_of_the_United_States
tk 112001
sj ldquoGeorge W Bushrdquo
ξi (ei sj ) or(President_of_the_United_StatesldquoGeorge W Bushrdquo)
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links
Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm
Example[[President_of_the_United_
States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_
States
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links
Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm
Example[[President_of_the_United_
States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_
States
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
OutputEntity-synonym relationships and time periods
Named Entity Synonym Time Period
Pope Benedict XVICardinal Joseph Ratzinger 052005 - 032009Joseph Ratzinger 052005 - 032009Pope Benedict XVI 052005 - 032009
Barack ObamaBarack Hussein Obama II 022007 - 032009Sen Barack Obama 072007 - 032009Senator Barack Obama 052006 - 032009
Hillary Rodham ClintonHillary Clinton 082003 - 032009Sen Hillary Clinton 032007 - 032009Senator Clinton 112007 - 032009
The time of synonyms are timestamps of Wikipedia articles (8 years) in which they
appear not temporal expression extracted from the contents
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time
18M articles from January 1987 to June 2007 (20 years)
Use the burst detection algorithm [Kleinberg in KDDrsquo2002]
Generate bursty periods of ξij by computing a rate of occurrencefrom document streams
Output bursty intervals and bursty weight ie periods ofoccurrence and intensity
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
OutputResults from burst-detection algorithm
Synonym Entity Burst Weight TimeStart End
President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993
Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Classifying synonyms into two types
DefinitionClass A time-independent
Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)
Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo
Class B time-dependent
Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered
Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Overview of experiments
Our experimental evaluation is divided into three main parts
1 Extracting and improving the accuracy of time of synonyms
2 Query expansion using time-independent synonyms
3 Query expansion using time-dependent synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonymsData collection
The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)
New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007
ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725
Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1
Measurement Accuracy
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
Data collection
TREC Robust Track (2004)
250 topics (topics 301-450 and topics 601-700)
Tools
Terrier ndash an open source search engine developed by University of Glasgow
BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting
Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight
qexp = qorg s1andw1 s2
andw2 skandwk
Measurement Mean Average Precision (MAP) R-precision and Recall
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonymsData collection
NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications
Select 20 strongly time-dependent queries
Measurement Precision at 10 20 and 30 retrieved documents
Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period
American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonyms
Statistics and accuracy of entity-synonym relationships extracted from Wikipedia
NER Method NE NE-Syn Avg Syn Accuracyper NE ()
BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73
BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2
BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo
Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Robust2004 query statistics
Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)
2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)
k = 2 if k gt 2 bring noise to the NERQ process
Number of queries using two different NERType MW-NERQ MRW-NERQ
Named entity 42 149Not named entity 208 101Total 250 250
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ
MAP R-precision Recall MAP R-precision Recall
PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504
PM Probabilistic Model without query expansion
PRF Pseudo Relevance Feedback using Rocchio algorithm
SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback
SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback
Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonyms
P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800
TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
QUEST Query Expansion using Synonyms over Time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest
Thank you
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
- Outline
- Main Talk
-
- Introduction
-
- Problem Statement
- Contributions
-
- Synonym Detection
-
- Entity Recognition and Synonym Extraction
- Improving the Accuracy of Time
-
- Query Expansion
-
- Time-based Synonyms
- Ranking Time-independent Synonyms
- Ranking Time-dependent Synonyms
-
- Evaluation
-
- Experiment Setting
- Experimental Results
-
- Conclusions
-
- Conclusions and Future Work
-
Outline
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
Outline
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
Outline
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
Outline
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
Outline
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Problem statement
In recent years document archives are publicly availableEg Internet Archive digital libraries and news archives
Searching in such resources is not straightforwardContents in these resources are strongly time-dependent
Query ldquoPope Benedict XVIrdquo and dates ldquobefore 2005rdquoUnable to retrieve documents about ldquoJoseph AloisRatzingerrdquoTo improve the retrieval effectiveness query expansionusing synonyms wrt time can be employed
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Problem statement
In recent years document archives are publicly availableEg Internet Archive digital libraries and news archives
Searching in such resources is not straightforwardContents in these resources are strongly time-dependent
Query ldquoPope Benedict XVIrdquo and dates ldquobefore 2005rdquoUnable to retrieve documents about ldquoJoseph AloisRatzingerrdquoTo improve the retrieval effectiveness query expansionusing synonyms wrt time can be employed
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Problem statement
In recent years document archives are publicly availableEg Internet Archive digital libraries and news archives
Searching in such resources is not straightforwardContents in these resources are strongly time-dependent
Query ldquoPope Benedict XVIrdquo and dates ldquobefore 2005rdquoUnable to retrieve documents about ldquoJoseph AloisRatzingerrdquoTo improve the retrieval effectiveness query expansionusing synonyms wrt time can be employed
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Observation
Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]
Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift
Synonyms are different words with similar meanings
In our context synonyms are terms used as name variants(other names titles or roles) of a named entity
Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Observation
Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]
Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift
Synonyms are different words with similar meanings
In our context synonyms are terms used as name variants(other names titles or roles) of a named entity
Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Observation
Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]
Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift
Synonyms are different words with similar meanings
In our context synonyms are terms used as name variants(other names titles or roles) of a named entity
Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
What are time-based synonyms
Time-independent synonyms are invariant to time
Time-dependent synonyms are relevant to a particular timeperiod ie entity-synonym relationships change over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Scenario 1Query ldquoPope Benedict XVIrdquo and written before 2005Documents about ldquoJoseph Alois Ratzingerrdquo are relevant
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Scenario 2Query ldquoHillary R Clintonrdquo and written from 1997 to 2002Documents about ldquoNew York Senatorrdquo and ldquoFirst Ladyof the United Statesrdquo are relevant
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Challenge
Semantic gaps in searching archives or a lack of knowledgeabout a query and synonyms at particular time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Figure A snapshot of Wikipediaand current revisions at time tk
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Example[Bunescu and Pasca EACLrsquo2006]1) Multi-word titles and all words arecapitalized
President_of_the_United_StatesrArr named entity
2) Single-word titles with multiple capitalletters
UNICEF and WHO are namedentities
3) 75 of occurrences in the article textitself are capitalized
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Example
ei President_of_the_United_States
tk 112001
sj ldquoGeorge W Bushrdquo
ξi (ei sj ) or(President_of_the_United_StatesldquoGeorge W Bushrdquo)
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links
Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm
Example[[President_of_the_United_
States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_
States
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links
Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm
Example[[President_of_the_United_
States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_
States
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
OutputEntity-synonym relationships and time periods
Named Entity Synonym Time Period
Pope Benedict XVICardinal Joseph Ratzinger 052005 - 032009Joseph Ratzinger 052005 - 032009Pope Benedict XVI 052005 - 032009
Barack ObamaBarack Hussein Obama II 022007 - 032009Sen Barack Obama 072007 - 032009Senator Barack Obama 052006 - 032009
Hillary Rodham ClintonHillary Clinton 082003 - 032009Sen Hillary Clinton 032007 - 032009Senator Clinton 112007 - 032009
The time of synonyms are timestamps of Wikipedia articles (8 years) in which they
appear not temporal expression extracted from the contents
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time
18M articles from January 1987 to June 2007 (20 years)
Use the burst detection algorithm [Kleinberg in KDDrsquo2002]
Generate bursty periods of ξij by computing a rate of occurrencefrom document streams
Output bursty intervals and bursty weight ie periods ofoccurrence and intensity
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
OutputResults from burst-detection algorithm
Synonym Entity Burst Weight TimeStart End
President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993
Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Classifying synonyms into two types
DefinitionClass A time-independent
Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)
Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo
Class B time-dependent
Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered
Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Overview of experiments
Our experimental evaluation is divided into three main parts
1 Extracting and improving the accuracy of time of synonyms
2 Query expansion using time-independent synonyms
3 Query expansion using time-dependent synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonymsData collection
The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)
New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007
ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725
Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1
Measurement Accuracy
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
Data collection
TREC Robust Track (2004)
250 topics (topics 301-450 and topics 601-700)
Tools
Terrier ndash an open source search engine developed by University of Glasgow
BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting
Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight
qexp = qorg s1andw1 s2
andw2 skandwk
Measurement Mean Average Precision (MAP) R-precision and Recall
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonymsData collection
NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications
Select 20 strongly time-dependent queries
Measurement Precision at 10 20 and 30 retrieved documents
Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period
American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonyms
Statistics and accuracy of entity-synonym relationships extracted from Wikipedia
NER Method NE NE-Syn Avg Syn Accuracyper NE ()
BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73
BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2
BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo
Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Robust2004 query statistics
Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)
2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)
k = 2 if k gt 2 bring noise to the NERQ process
Number of queries using two different NERType MW-NERQ MRW-NERQ
Named entity 42 149Not named entity 208 101Total 250 250
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ
MAP R-precision Recall MAP R-precision Recall
PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504
PM Probabilistic Model without query expansion
PRF Pseudo Relevance Feedback using Rocchio algorithm
SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback
SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback
Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonyms
P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800
TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
QUEST Query Expansion using Synonyms over Time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest
Thank you
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
- Outline
- Main Talk
-
- Introduction
-
- Problem Statement
- Contributions
-
- Synonym Detection
-
- Entity Recognition and Synonym Extraction
- Improving the Accuracy of Time
-
- Query Expansion
-
- Time-based Synonyms
- Ranking Time-independent Synonyms
- Ranking Time-dependent Synonyms
-
- Evaluation
-
- Experiment Setting
- Experimental Results
-
- Conclusions
-
- Conclusions and Future Work
-
Outline
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
Outline
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
Outline
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
Outline
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Problem statement
In recent years document archives are publicly availableEg Internet Archive digital libraries and news archives
Searching in such resources is not straightforwardContents in these resources are strongly time-dependent
Query ldquoPope Benedict XVIrdquo and dates ldquobefore 2005rdquoUnable to retrieve documents about ldquoJoseph AloisRatzingerrdquoTo improve the retrieval effectiveness query expansionusing synonyms wrt time can be employed
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Problem statement
In recent years document archives are publicly availableEg Internet Archive digital libraries and news archives
Searching in such resources is not straightforwardContents in these resources are strongly time-dependent
Query ldquoPope Benedict XVIrdquo and dates ldquobefore 2005rdquoUnable to retrieve documents about ldquoJoseph AloisRatzingerrdquoTo improve the retrieval effectiveness query expansionusing synonyms wrt time can be employed
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Problem statement
In recent years document archives are publicly availableEg Internet Archive digital libraries and news archives
Searching in such resources is not straightforwardContents in these resources are strongly time-dependent
Query ldquoPope Benedict XVIrdquo and dates ldquobefore 2005rdquoUnable to retrieve documents about ldquoJoseph AloisRatzingerrdquoTo improve the retrieval effectiveness query expansionusing synonyms wrt time can be employed
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Observation
Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]
Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift
Synonyms are different words with similar meanings
In our context synonyms are terms used as name variants(other names titles or roles) of a named entity
Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Observation
Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]
Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift
Synonyms are different words with similar meanings
In our context synonyms are terms used as name variants(other names titles or roles) of a named entity
Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Observation
Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]
Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift
Synonyms are different words with similar meanings
In our context synonyms are terms used as name variants(other names titles or roles) of a named entity
Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
What are time-based synonyms
Time-independent synonyms are invariant to time
Time-dependent synonyms are relevant to a particular timeperiod ie entity-synonym relationships change over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Scenario 1Query ldquoPope Benedict XVIrdquo and written before 2005Documents about ldquoJoseph Alois Ratzingerrdquo are relevant
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Scenario 2Query ldquoHillary R Clintonrdquo and written from 1997 to 2002Documents about ldquoNew York Senatorrdquo and ldquoFirst Ladyof the United Statesrdquo are relevant
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Challenge
Semantic gaps in searching archives or a lack of knowledgeabout a query and synonyms at particular time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Figure A snapshot of Wikipediaand current revisions at time tk
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Example[Bunescu and Pasca EACLrsquo2006]1) Multi-word titles and all words arecapitalized
President_of_the_United_StatesrArr named entity
2) Single-word titles with multiple capitalletters
UNICEF and WHO are namedentities
3) 75 of occurrences in the article textitself are capitalized
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Example
ei President_of_the_United_States
tk 112001
sj ldquoGeorge W Bushrdquo
ξi (ei sj ) or(President_of_the_United_StatesldquoGeorge W Bushrdquo)
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links
Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm
Example[[President_of_the_United_
States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_
States
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links
Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm
Example[[President_of_the_United_
States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_
States
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
OutputEntity-synonym relationships and time periods
Named Entity Synonym Time Period
Pope Benedict XVICardinal Joseph Ratzinger 052005 - 032009Joseph Ratzinger 052005 - 032009Pope Benedict XVI 052005 - 032009
Barack ObamaBarack Hussein Obama II 022007 - 032009Sen Barack Obama 072007 - 032009Senator Barack Obama 052006 - 032009
Hillary Rodham ClintonHillary Clinton 082003 - 032009Sen Hillary Clinton 032007 - 032009Senator Clinton 112007 - 032009
The time of synonyms are timestamps of Wikipedia articles (8 years) in which they
appear not temporal expression extracted from the contents
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time
18M articles from January 1987 to June 2007 (20 years)
Use the burst detection algorithm [Kleinberg in KDDrsquo2002]
Generate bursty periods of ξij by computing a rate of occurrencefrom document streams
Output bursty intervals and bursty weight ie periods ofoccurrence and intensity
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
OutputResults from burst-detection algorithm
Synonym Entity Burst Weight TimeStart End
President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993
Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Classifying synonyms into two types
DefinitionClass A time-independent
Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)
Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo
Class B time-dependent
Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered
Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Overview of experiments
Our experimental evaluation is divided into three main parts
1 Extracting and improving the accuracy of time of synonyms
2 Query expansion using time-independent synonyms
3 Query expansion using time-dependent synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonymsData collection
The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)
New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007
ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725
Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1
Measurement Accuracy
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
Data collection
TREC Robust Track (2004)
250 topics (topics 301-450 and topics 601-700)
Tools
Terrier ndash an open source search engine developed by University of Glasgow
BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting
Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight
qexp = qorg s1andw1 s2
andw2 skandwk
Measurement Mean Average Precision (MAP) R-precision and Recall
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonymsData collection
NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications
Select 20 strongly time-dependent queries
Measurement Precision at 10 20 and 30 retrieved documents
Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period
American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonyms
Statistics and accuracy of entity-synonym relationships extracted from Wikipedia
NER Method NE NE-Syn Avg Syn Accuracyper NE ()
BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73
BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2
BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo
Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Robust2004 query statistics
Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)
2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)
k = 2 if k gt 2 bring noise to the NERQ process
Number of queries using two different NERType MW-NERQ MRW-NERQ
Named entity 42 149Not named entity 208 101Total 250 250
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ
MAP R-precision Recall MAP R-precision Recall
PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504
PM Probabilistic Model without query expansion
PRF Pseudo Relevance Feedback using Rocchio algorithm
SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback
SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback
Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonyms
P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800
TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
QUEST Query Expansion using Synonyms over Time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest
Thank you
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
- Outline
- Main Talk
-
- Introduction
-
- Problem Statement
- Contributions
-
- Synonym Detection
-
- Entity Recognition and Synonym Extraction
- Improving the Accuracy of Time
-
- Query Expansion
-
- Time-based Synonyms
- Ranking Time-independent Synonyms
- Ranking Time-dependent Synonyms
-
- Evaluation
-
- Experiment Setting
- Experimental Results
-
- Conclusions
-
- Conclusions and Future Work
-
Outline
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
Outline
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
Outline
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Problem statement
In recent years document archives are publicly availableEg Internet Archive digital libraries and news archives
Searching in such resources is not straightforwardContents in these resources are strongly time-dependent
Query ldquoPope Benedict XVIrdquo and dates ldquobefore 2005rdquoUnable to retrieve documents about ldquoJoseph AloisRatzingerrdquoTo improve the retrieval effectiveness query expansionusing synonyms wrt time can be employed
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Problem statement
In recent years document archives are publicly availableEg Internet Archive digital libraries and news archives
Searching in such resources is not straightforwardContents in these resources are strongly time-dependent
Query ldquoPope Benedict XVIrdquo and dates ldquobefore 2005rdquoUnable to retrieve documents about ldquoJoseph AloisRatzingerrdquoTo improve the retrieval effectiveness query expansionusing synonyms wrt time can be employed
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Problem statement
In recent years document archives are publicly availableEg Internet Archive digital libraries and news archives
Searching in such resources is not straightforwardContents in these resources are strongly time-dependent
Query ldquoPope Benedict XVIrdquo and dates ldquobefore 2005rdquoUnable to retrieve documents about ldquoJoseph AloisRatzingerrdquoTo improve the retrieval effectiveness query expansionusing synonyms wrt time can be employed
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Observation
Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]
Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift
Synonyms are different words with similar meanings
In our context synonyms are terms used as name variants(other names titles or roles) of a named entity
Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Observation
Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]
Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift
Synonyms are different words with similar meanings
In our context synonyms are terms used as name variants(other names titles or roles) of a named entity
Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Observation
Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]
Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift
Synonyms are different words with similar meanings
In our context synonyms are terms used as name variants(other names titles or roles) of a named entity
Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
What are time-based synonyms
Time-independent synonyms are invariant to time
Time-dependent synonyms are relevant to a particular timeperiod ie entity-synonym relationships change over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Scenario 1Query ldquoPope Benedict XVIrdquo and written before 2005Documents about ldquoJoseph Alois Ratzingerrdquo are relevant
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Scenario 2Query ldquoHillary R Clintonrdquo and written from 1997 to 2002Documents about ldquoNew York Senatorrdquo and ldquoFirst Ladyof the United Statesrdquo are relevant
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Challenge
Semantic gaps in searching archives or a lack of knowledgeabout a query and synonyms at particular time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Figure A snapshot of Wikipediaand current revisions at time tk
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Example[Bunescu and Pasca EACLrsquo2006]1) Multi-word titles and all words arecapitalized
President_of_the_United_StatesrArr named entity
2) Single-word titles with multiple capitalletters
UNICEF and WHO are namedentities
3) 75 of occurrences in the article textitself are capitalized
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Example
ei President_of_the_United_States
tk 112001
sj ldquoGeorge W Bushrdquo
ξi (ei sj ) or(President_of_the_United_StatesldquoGeorge W Bushrdquo)
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links
Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm
Example[[President_of_the_United_
States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_
States
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links
Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm
Example[[President_of_the_United_
States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_
States
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
OutputEntity-synonym relationships and time periods
Named Entity Synonym Time Period
Pope Benedict XVICardinal Joseph Ratzinger 052005 - 032009Joseph Ratzinger 052005 - 032009Pope Benedict XVI 052005 - 032009
Barack ObamaBarack Hussein Obama II 022007 - 032009Sen Barack Obama 072007 - 032009Senator Barack Obama 052006 - 032009
Hillary Rodham ClintonHillary Clinton 082003 - 032009Sen Hillary Clinton 032007 - 032009Senator Clinton 112007 - 032009
The time of synonyms are timestamps of Wikipedia articles (8 years) in which they
appear not temporal expression extracted from the contents
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time
18M articles from January 1987 to June 2007 (20 years)
Use the burst detection algorithm [Kleinberg in KDDrsquo2002]
Generate bursty periods of ξij by computing a rate of occurrencefrom document streams
Output bursty intervals and bursty weight ie periods ofoccurrence and intensity
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
OutputResults from burst-detection algorithm
Synonym Entity Burst Weight TimeStart End
President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993
Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Classifying synonyms into two types
DefinitionClass A time-independent
Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)
Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo
Class B time-dependent
Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered
Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Overview of experiments
Our experimental evaluation is divided into three main parts
1 Extracting and improving the accuracy of time of synonyms
2 Query expansion using time-independent synonyms
3 Query expansion using time-dependent synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonymsData collection
The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)
New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007
ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725
Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1
Measurement Accuracy
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
Data collection
TREC Robust Track (2004)
250 topics (topics 301-450 and topics 601-700)
Tools
Terrier ndash an open source search engine developed by University of Glasgow
BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting
Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight
qexp = qorg s1andw1 s2
andw2 skandwk
Measurement Mean Average Precision (MAP) R-precision and Recall
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonymsData collection
NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications
Select 20 strongly time-dependent queries
Measurement Precision at 10 20 and 30 retrieved documents
Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period
American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonyms
Statistics and accuracy of entity-synonym relationships extracted from Wikipedia
NER Method NE NE-Syn Avg Syn Accuracyper NE ()
BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73
BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2
BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo
Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Robust2004 query statistics
Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)
2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)
k = 2 if k gt 2 bring noise to the NERQ process
Number of queries using two different NERType MW-NERQ MRW-NERQ
Named entity 42 149Not named entity 208 101Total 250 250
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ
MAP R-precision Recall MAP R-precision Recall
PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504
PM Probabilistic Model without query expansion
PRF Pseudo Relevance Feedback using Rocchio algorithm
SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback
SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback
Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonyms
P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800
TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
QUEST Query Expansion using Synonyms over Time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest
Thank you
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
- Outline
- Main Talk
-
- Introduction
-
- Problem Statement
- Contributions
-
- Synonym Detection
-
- Entity Recognition and Synonym Extraction
- Improving the Accuracy of Time
-
- Query Expansion
-
- Time-based Synonyms
- Ranking Time-independent Synonyms
- Ranking Time-dependent Synonyms
-
- Evaluation
-
- Experiment Setting
- Experimental Results
-
- Conclusions
-
- Conclusions and Future Work
-
Outline
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
Outline
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Problem statement
In recent years document archives are publicly availableEg Internet Archive digital libraries and news archives
Searching in such resources is not straightforwardContents in these resources are strongly time-dependent
Query ldquoPope Benedict XVIrdquo and dates ldquobefore 2005rdquoUnable to retrieve documents about ldquoJoseph AloisRatzingerrdquoTo improve the retrieval effectiveness query expansionusing synonyms wrt time can be employed
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Problem statement
In recent years document archives are publicly availableEg Internet Archive digital libraries and news archives
Searching in such resources is not straightforwardContents in these resources are strongly time-dependent
Query ldquoPope Benedict XVIrdquo and dates ldquobefore 2005rdquoUnable to retrieve documents about ldquoJoseph AloisRatzingerrdquoTo improve the retrieval effectiveness query expansionusing synonyms wrt time can be employed
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Problem statement
In recent years document archives are publicly availableEg Internet Archive digital libraries and news archives
Searching in such resources is not straightforwardContents in these resources are strongly time-dependent
Query ldquoPope Benedict XVIrdquo and dates ldquobefore 2005rdquoUnable to retrieve documents about ldquoJoseph AloisRatzingerrdquoTo improve the retrieval effectiveness query expansionusing synonyms wrt time can be employed
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Observation
Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]
Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift
Synonyms are different words with similar meanings
In our context synonyms are terms used as name variants(other names titles or roles) of a named entity
Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Observation
Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]
Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift
Synonyms are different words with similar meanings
In our context synonyms are terms used as name variants(other names titles or roles) of a named entity
Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Observation
Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]
Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift
Synonyms are different words with similar meanings
In our context synonyms are terms used as name variants(other names titles or roles) of a named entity
Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
What are time-based synonyms
Time-independent synonyms are invariant to time
Time-dependent synonyms are relevant to a particular timeperiod ie entity-synonym relationships change over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Scenario 1Query ldquoPope Benedict XVIrdquo and written before 2005Documents about ldquoJoseph Alois Ratzingerrdquo are relevant
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Scenario 2Query ldquoHillary R Clintonrdquo and written from 1997 to 2002Documents about ldquoNew York Senatorrdquo and ldquoFirst Ladyof the United Statesrdquo are relevant
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Challenge
Semantic gaps in searching archives or a lack of knowledgeabout a query and synonyms at particular time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Figure A snapshot of Wikipediaand current revisions at time tk
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Example[Bunescu and Pasca EACLrsquo2006]1) Multi-word titles and all words arecapitalized
President_of_the_United_StatesrArr named entity
2) Single-word titles with multiple capitalletters
UNICEF and WHO are namedentities
3) 75 of occurrences in the article textitself are capitalized
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Example
ei President_of_the_United_States
tk 112001
sj ldquoGeorge W Bushrdquo
ξi (ei sj ) or(President_of_the_United_StatesldquoGeorge W Bushrdquo)
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links
Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm
Example[[President_of_the_United_
States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_
States
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links
Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm
Example[[President_of_the_United_
States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_
States
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
OutputEntity-synonym relationships and time periods
Named Entity Synonym Time Period
Pope Benedict XVICardinal Joseph Ratzinger 052005 - 032009Joseph Ratzinger 052005 - 032009Pope Benedict XVI 052005 - 032009
Barack ObamaBarack Hussein Obama II 022007 - 032009Sen Barack Obama 072007 - 032009Senator Barack Obama 052006 - 032009
Hillary Rodham ClintonHillary Clinton 082003 - 032009Sen Hillary Clinton 032007 - 032009Senator Clinton 112007 - 032009
The time of synonyms are timestamps of Wikipedia articles (8 years) in which they
appear not temporal expression extracted from the contents
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time
18M articles from January 1987 to June 2007 (20 years)
Use the burst detection algorithm [Kleinberg in KDDrsquo2002]
Generate bursty periods of ξij by computing a rate of occurrencefrom document streams
Output bursty intervals and bursty weight ie periods ofoccurrence and intensity
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
OutputResults from burst-detection algorithm
Synonym Entity Burst Weight TimeStart End
President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993
Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Classifying synonyms into two types
DefinitionClass A time-independent
Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)
Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo
Class B time-dependent
Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered
Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Overview of experiments
Our experimental evaluation is divided into three main parts
1 Extracting and improving the accuracy of time of synonyms
2 Query expansion using time-independent synonyms
3 Query expansion using time-dependent synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonymsData collection
The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)
New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007
ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725
Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1
Measurement Accuracy
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
Data collection
TREC Robust Track (2004)
250 topics (topics 301-450 and topics 601-700)
Tools
Terrier ndash an open source search engine developed by University of Glasgow
BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting
Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight
qexp = qorg s1andw1 s2
andw2 skandwk
Measurement Mean Average Precision (MAP) R-precision and Recall
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonymsData collection
NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications
Select 20 strongly time-dependent queries
Measurement Precision at 10 20 and 30 retrieved documents
Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period
American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonyms
Statistics and accuracy of entity-synonym relationships extracted from Wikipedia
NER Method NE NE-Syn Avg Syn Accuracyper NE ()
BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73
BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2
BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo
Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Robust2004 query statistics
Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)
2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)
k = 2 if k gt 2 bring noise to the NERQ process
Number of queries using two different NERType MW-NERQ MRW-NERQ
Named entity 42 149Not named entity 208 101Total 250 250
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ
MAP R-precision Recall MAP R-precision Recall
PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504
PM Probabilistic Model without query expansion
PRF Pseudo Relevance Feedback using Rocchio algorithm
SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback
SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback
Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonyms
P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800
TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
QUEST Query Expansion using Synonyms over Time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest
Thank you
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
- Outline
- Main Talk
-
- Introduction
-
- Problem Statement
- Contributions
-
- Synonym Detection
-
- Entity Recognition and Synonym Extraction
- Improving the Accuracy of Time
-
- Query Expansion
-
- Time-based Synonyms
- Ranking Time-independent Synonyms
- Ranking Time-dependent Synonyms
-
- Evaluation
-
- Experiment Setting
- Experimental Results
-
- Conclusions
-
- Conclusions and Future Work
-
Outline
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Problem statement
In recent years document archives are publicly availableEg Internet Archive digital libraries and news archives
Searching in such resources is not straightforwardContents in these resources are strongly time-dependent
Query ldquoPope Benedict XVIrdquo and dates ldquobefore 2005rdquoUnable to retrieve documents about ldquoJoseph AloisRatzingerrdquoTo improve the retrieval effectiveness query expansionusing synonyms wrt time can be employed
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Problem statement
In recent years document archives are publicly availableEg Internet Archive digital libraries and news archives
Searching in such resources is not straightforwardContents in these resources are strongly time-dependent
Query ldquoPope Benedict XVIrdquo and dates ldquobefore 2005rdquoUnable to retrieve documents about ldquoJoseph AloisRatzingerrdquoTo improve the retrieval effectiveness query expansionusing synonyms wrt time can be employed
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Problem statement
In recent years document archives are publicly availableEg Internet Archive digital libraries and news archives
Searching in such resources is not straightforwardContents in these resources are strongly time-dependent
Query ldquoPope Benedict XVIrdquo and dates ldquobefore 2005rdquoUnable to retrieve documents about ldquoJoseph AloisRatzingerrdquoTo improve the retrieval effectiveness query expansionusing synonyms wrt time can be employed
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Observation
Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]
Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift
Synonyms are different words with similar meanings
In our context synonyms are terms used as name variants(other names titles or roles) of a named entity
Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Observation
Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]
Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift
Synonyms are different words with similar meanings
In our context synonyms are terms used as name variants(other names titles or roles) of a named entity
Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Observation
Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]
Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift
Synonyms are different words with similar meanings
In our context synonyms are terms used as name variants(other names titles or roles) of a named entity
Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
What are time-based synonyms
Time-independent synonyms are invariant to time
Time-dependent synonyms are relevant to a particular timeperiod ie entity-synonym relationships change over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Scenario 1Query ldquoPope Benedict XVIrdquo and written before 2005Documents about ldquoJoseph Alois Ratzingerrdquo are relevant
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Scenario 2Query ldquoHillary R Clintonrdquo and written from 1997 to 2002Documents about ldquoNew York Senatorrdquo and ldquoFirst Ladyof the United Statesrdquo are relevant
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Challenge
Semantic gaps in searching archives or a lack of knowledgeabout a query and synonyms at particular time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Figure A snapshot of Wikipediaand current revisions at time tk
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Example[Bunescu and Pasca EACLrsquo2006]1) Multi-word titles and all words arecapitalized
President_of_the_United_StatesrArr named entity
2) Single-word titles with multiple capitalletters
UNICEF and WHO are namedentities
3) 75 of occurrences in the article textitself are capitalized
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Example
ei President_of_the_United_States
tk 112001
sj ldquoGeorge W Bushrdquo
ξi (ei sj ) or(President_of_the_United_StatesldquoGeorge W Bushrdquo)
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links
Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm
Example[[President_of_the_United_
States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_
States
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links
Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm
Example[[President_of_the_United_
States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_
States
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
OutputEntity-synonym relationships and time periods
Named Entity Synonym Time Period
Pope Benedict XVICardinal Joseph Ratzinger 052005 - 032009Joseph Ratzinger 052005 - 032009Pope Benedict XVI 052005 - 032009
Barack ObamaBarack Hussein Obama II 022007 - 032009Sen Barack Obama 072007 - 032009Senator Barack Obama 052006 - 032009
Hillary Rodham ClintonHillary Clinton 082003 - 032009Sen Hillary Clinton 032007 - 032009Senator Clinton 112007 - 032009
The time of synonyms are timestamps of Wikipedia articles (8 years) in which they
appear not temporal expression extracted from the contents
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time
18M articles from January 1987 to June 2007 (20 years)
Use the burst detection algorithm [Kleinberg in KDDrsquo2002]
Generate bursty periods of ξij by computing a rate of occurrencefrom document streams
Output bursty intervals and bursty weight ie periods ofoccurrence and intensity
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
OutputResults from burst-detection algorithm
Synonym Entity Burst Weight TimeStart End
President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993
Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Classifying synonyms into two types
DefinitionClass A time-independent
Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)
Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo
Class B time-dependent
Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered
Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Overview of experiments
Our experimental evaluation is divided into three main parts
1 Extracting and improving the accuracy of time of synonyms
2 Query expansion using time-independent synonyms
3 Query expansion using time-dependent synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonymsData collection
The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)
New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007
ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725
Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1
Measurement Accuracy
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
Data collection
TREC Robust Track (2004)
250 topics (topics 301-450 and topics 601-700)
Tools
Terrier ndash an open source search engine developed by University of Glasgow
BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting
Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight
qexp = qorg s1andw1 s2
andw2 skandwk
Measurement Mean Average Precision (MAP) R-precision and Recall
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonymsData collection
NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications
Select 20 strongly time-dependent queries
Measurement Precision at 10 20 and 30 retrieved documents
Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period
American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonyms
Statistics and accuracy of entity-synonym relationships extracted from Wikipedia
NER Method NE NE-Syn Avg Syn Accuracyper NE ()
BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73
BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2
BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo
Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Robust2004 query statistics
Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)
2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)
k = 2 if k gt 2 bring noise to the NERQ process
Number of queries using two different NERType MW-NERQ MRW-NERQ
Named entity 42 149Not named entity 208 101Total 250 250
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ
MAP R-precision Recall MAP R-precision Recall
PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504
PM Probabilistic Model without query expansion
PRF Pseudo Relevance Feedback using Rocchio algorithm
SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback
SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback
Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonyms
P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800
TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
QUEST Query Expansion using Synonyms over Time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest
Thank you
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
- Outline
- Main Talk
-
- Introduction
-
- Problem Statement
- Contributions
-
- Synonym Detection
-
- Entity Recognition and Synonym Extraction
- Improving the Accuracy of Time
-
- Query Expansion
-
- Time-based Synonyms
- Ranking Time-independent Synonyms
- Ranking Time-dependent Synonyms
-
- Evaluation
-
- Experiment Setting
- Experimental Results
-
- Conclusions
-
- Conclusions and Future Work
-
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Problem statement
In recent years document archives are publicly availableEg Internet Archive digital libraries and news archives
Searching in such resources is not straightforwardContents in these resources are strongly time-dependent
Query ldquoPope Benedict XVIrdquo and dates ldquobefore 2005rdquoUnable to retrieve documents about ldquoJoseph AloisRatzingerrdquoTo improve the retrieval effectiveness query expansionusing synonyms wrt time can be employed
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Problem statement
In recent years document archives are publicly availableEg Internet Archive digital libraries and news archives
Searching in such resources is not straightforwardContents in these resources are strongly time-dependent
Query ldquoPope Benedict XVIrdquo and dates ldquobefore 2005rdquoUnable to retrieve documents about ldquoJoseph AloisRatzingerrdquoTo improve the retrieval effectiveness query expansionusing synonyms wrt time can be employed
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Problem statement
In recent years document archives are publicly availableEg Internet Archive digital libraries and news archives
Searching in such resources is not straightforwardContents in these resources are strongly time-dependent
Query ldquoPope Benedict XVIrdquo and dates ldquobefore 2005rdquoUnable to retrieve documents about ldquoJoseph AloisRatzingerrdquoTo improve the retrieval effectiveness query expansionusing synonyms wrt time can be employed
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Observation
Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]
Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift
Synonyms are different words with similar meanings
In our context synonyms are terms used as name variants(other names titles or roles) of a named entity
Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Observation
Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]
Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift
Synonyms are different words with similar meanings
In our context synonyms are terms used as name variants(other names titles or roles) of a named entity
Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Observation
Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]
Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift
Synonyms are different words with similar meanings
In our context synonyms are terms used as name variants(other names titles or roles) of a named entity
Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
What are time-based synonyms
Time-independent synonyms are invariant to time
Time-dependent synonyms are relevant to a particular timeperiod ie entity-synonym relationships change over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Scenario 1Query ldquoPope Benedict XVIrdquo and written before 2005Documents about ldquoJoseph Alois Ratzingerrdquo are relevant
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Scenario 2Query ldquoHillary R Clintonrdquo and written from 1997 to 2002Documents about ldquoNew York Senatorrdquo and ldquoFirst Ladyof the United Statesrdquo are relevant
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Challenge
Semantic gaps in searching archives or a lack of knowledgeabout a query and synonyms at particular time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Figure A snapshot of Wikipediaand current revisions at time tk
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Example[Bunescu and Pasca EACLrsquo2006]1) Multi-word titles and all words arecapitalized
President_of_the_United_StatesrArr named entity
2) Single-word titles with multiple capitalletters
UNICEF and WHO are namedentities
3) 75 of occurrences in the article textitself are capitalized
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Example
ei President_of_the_United_States
tk 112001
sj ldquoGeorge W Bushrdquo
ξi (ei sj ) or(President_of_the_United_StatesldquoGeorge W Bushrdquo)
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links
Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm
Example[[President_of_the_United_
States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_
States
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links
Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm
Example[[President_of_the_United_
States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_
States
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
OutputEntity-synonym relationships and time periods
Named Entity Synonym Time Period
Pope Benedict XVICardinal Joseph Ratzinger 052005 - 032009Joseph Ratzinger 052005 - 032009Pope Benedict XVI 052005 - 032009
Barack ObamaBarack Hussein Obama II 022007 - 032009Sen Barack Obama 072007 - 032009Senator Barack Obama 052006 - 032009
Hillary Rodham ClintonHillary Clinton 082003 - 032009Sen Hillary Clinton 032007 - 032009Senator Clinton 112007 - 032009
The time of synonyms are timestamps of Wikipedia articles (8 years) in which they
appear not temporal expression extracted from the contents
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time
18M articles from January 1987 to June 2007 (20 years)
Use the burst detection algorithm [Kleinberg in KDDrsquo2002]
Generate bursty periods of ξij by computing a rate of occurrencefrom document streams
Output bursty intervals and bursty weight ie periods ofoccurrence and intensity
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
OutputResults from burst-detection algorithm
Synonym Entity Burst Weight TimeStart End
President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993
Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Classifying synonyms into two types
DefinitionClass A time-independent
Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)
Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo
Class B time-dependent
Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered
Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Overview of experiments
Our experimental evaluation is divided into three main parts
1 Extracting and improving the accuracy of time of synonyms
2 Query expansion using time-independent synonyms
3 Query expansion using time-dependent synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonymsData collection
The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)
New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007
ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725
Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1
Measurement Accuracy
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
Data collection
TREC Robust Track (2004)
250 topics (topics 301-450 and topics 601-700)
Tools
Terrier ndash an open source search engine developed by University of Glasgow
BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting
Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight
qexp = qorg s1andw1 s2
andw2 skandwk
Measurement Mean Average Precision (MAP) R-precision and Recall
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonymsData collection
NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications
Select 20 strongly time-dependent queries
Measurement Precision at 10 20 and 30 retrieved documents
Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period
American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonyms
Statistics and accuracy of entity-synonym relationships extracted from Wikipedia
NER Method NE NE-Syn Avg Syn Accuracyper NE ()
BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73
BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2
BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo
Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Robust2004 query statistics
Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)
2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)
k = 2 if k gt 2 bring noise to the NERQ process
Number of queries using two different NERType MW-NERQ MRW-NERQ
Named entity 42 149Not named entity 208 101Total 250 250
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ
MAP R-precision Recall MAP R-precision Recall
PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504
PM Probabilistic Model without query expansion
PRF Pseudo Relevance Feedback using Rocchio algorithm
SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback
SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback
Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonyms
P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800
TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
QUEST Query Expansion using Synonyms over Time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest
Thank you
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
- Outline
- Main Talk
-
- Introduction
-
- Problem Statement
- Contributions
-
- Synonym Detection
-
- Entity Recognition and Synonym Extraction
- Improving the Accuracy of Time
-
- Query Expansion
-
- Time-based Synonyms
- Ranking Time-independent Synonyms
- Ranking Time-dependent Synonyms
-
- Evaluation
-
- Experiment Setting
- Experimental Results
-
- Conclusions
-
- Conclusions and Future Work
-
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Problem statement
In recent years document archives are publicly availableEg Internet Archive digital libraries and news archives
Searching in such resources is not straightforwardContents in these resources are strongly time-dependent
Query ldquoPope Benedict XVIrdquo and dates ldquobefore 2005rdquoUnable to retrieve documents about ldquoJoseph AloisRatzingerrdquoTo improve the retrieval effectiveness query expansionusing synonyms wrt time can be employed
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Problem statement
In recent years document archives are publicly availableEg Internet Archive digital libraries and news archives
Searching in such resources is not straightforwardContents in these resources are strongly time-dependent
Query ldquoPope Benedict XVIrdquo and dates ldquobefore 2005rdquoUnable to retrieve documents about ldquoJoseph AloisRatzingerrdquoTo improve the retrieval effectiveness query expansionusing synonyms wrt time can be employed
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Problem statement
In recent years document archives are publicly availableEg Internet Archive digital libraries and news archives
Searching in such resources is not straightforwardContents in these resources are strongly time-dependent
Query ldquoPope Benedict XVIrdquo and dates ldquobefore 2005rdquoUnable to retrieve documents about ldquoJoseph AloisRatzingerrdquoTo improve the retrieval effectiveness query expansionusing synonyms wrt time can be employed
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Observation
Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]
Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift
Synonyms are different words with similar meanings
In our context synonyms are terms used as name variants(other names titles or roles) of a named entity
Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Observation
Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]
Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift
Synonyms are different words with similar meanings
In our context synonyms are terms used as name variants(other names titles or roles) of a named entity
Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Observation
Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]
Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift
Synonyms are different words with similar meanings
In our context synonyms are terms used as name variants(other names titles or roles) of a named entity
Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
What are time-based synonyms
Time-independent synonyms are invariant to time
Time-dependent synonyms are relevant to a particular timeperiod ie entity-synonym relationships change over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Scenario 1Query ldquoPope Benedict XVIrdquo and written before 2005Documents about ldquoJoseph Alois Ratzingerrdquo are relevant
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Scenario 2Query ldquoHillary R Clintonrdquo and written from 1997 to 2002Documents about ldquoNew York Senatorrdquo and ldquoFirst Ladyof the United Statesrdquo are relevant
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Challenge
Semantic gaps in searching archives or a lack of knowledgeabout a query and synonyms at particular time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Figure A snapshot of Wikipediaand current revisions at time tk
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Example[Bunescu and Pasca EACLrsquo2006]1) Multi-word titles and all words arecapitalized
President_of_the_United_StatesrArr named entity
2) Single-word titles with multiple capitalletters
UNICEF and WHO are namedentities
3) 75 of occurrences in the article textitself are capitalized
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Example
ei President_of_the_United_States
tk 112001
sj ldquoGeorge W Bushrdquo
ξi (ei sj ) or(President_of_the_United_StatesldquoGeorge W Bushrdquo)
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links
Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm
Example[[President_of_the_United_
States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_
States
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links
Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm
Example[[President_of_the_United_
States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_
States
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
OutputEntity-synonym relationships and time periods
Named Entity Synonym Time Period
Pope Benedict XVICardinal Joseph Ratzinger 052005 - 032009Joseph Ratzinger 052005 - 032009Pope Benedict XVI 052005 - 032009
Barack ObamaBarack Hussein Obama II 022007 - 032009Sen Barack Obama 072007 - 032009Senator Barack Obama 052006 - 032009
Hillary Rodham ClintonHillary Clinton 082003 - 032009Sen Hillary Clinton 032007 - 032009Senator Clinton 112007 - 032009
The time of synonyms are timestamps of Wikipedia articles (8 years) in which they
appear not temporal expression extracted from the contents
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time
18M articles from January 1987 to June 2007 (20 years)
Use the burst detection algorithm [Kleinberg in KDDrsquo2002]
Generate bursty periods of ξij by computing a rate of occurrencefrom document streams
Output bursty intervals and bursty weight ie periods ofoccurrence and intensity
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
OutputResults from burst-detection algorithm
Synonym Entity Burst Weight TimeStart End
President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993
Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Classifying synonyms into two types
DefinitionClass A time-independent
Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)
Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo
Class B time-dependent
Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered
Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Overview of experiments
Our experimental evaluation is divided into three main parts
1 Extracting and improving the accuracy of time of synonyms
2 Query expansion using time-independent synonyms
3 Query expansion using time-dependent synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonymsData collection
The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)
New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007
ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725
Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1
Measurement Accuracy
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
Data collection
TREC Robust Track (2004)
250 topics (topics 301-450 and topics 601-700)
Tools
Terrier ndash an open source search engine developed by University of Glasgow
BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting
Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight
qexp = qorg s1andw1 s2
andw2 skandwk
Measurement Mean Average Precision (MAP) R-precision and Recall
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonymsData collection
NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications
Select 20 strongly time-dependent queries
Measurement Precision at 10 20 and 30 retrieved documents
Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period
American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonyms
Statistics and accuracy of entity-synonym relationships extracted from Wikipedia
NER Method NE NE-Syn Avg Syn Accuracyper NE ()
BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73
BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2
BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo
Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Robust2004 query statistics
Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)
2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)
k = 2 if k gt 2 bring noise to the NERQ process
Number of queries using two different NERType MW-NERQ MRW-NERQ
Named entity 42 149Not named entity 208 101Total 250 250
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ
MAP R-precision Recall MAP R-precision Recall
PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504
PM Probabilistic Model without query expansion
PRF Pseudo Relevance Feedback using Rocchio algorithm
SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback
SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback
Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonyms
P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800
TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
QUEST Query Expansion using Synonyms over Time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest
Thank you
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
- Outline
- Main Talk
-
- Introduction
-
- Problem Statement
- Contributions
-
- Synonym Detection
-
- Entity Recognition and Synonym Extraction
- Improving the Accuracy of Time
-
- Query Expansion
-
- Time-based Synonyms
- Ranking Time-independent Synonyms
- Ranking Time-dependent Synonyms
-
- Evaluation
-
- Experiment Setting
- Experimental Results
-
- Conclusions
-
- Conclusions and Future Work
-
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Problem statement
In recent years document archives are publicly availableEg Internet Archive digital libraries and news archives
Searching in such resources is not straightforwardContents in these resources are strongly time-dependent
Query ldquoPope Benedict XVIrdquo and dates ldquobefore 2005rdquoUnable to retrieve documents about ldquoJoseph AloisRatzingerrdquoTo improve the retrieval effectiveness query expansionusing synonyms wrt time can be employed
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Problem statement
In recent years document archives are publicly availableEg Internet Archive digital libraries and news archives
Searching in such resources is not straightforwardContents in these resources are strongly time-dependent
Query ldquoPope Benedict XVIrdquo and dates ldquobefore 2005rdquoUnable to retrieve documents about ldquoJoseph AloisRatzingerrdquoTo improve the retrieval effectiveness query expansionusing synonyms wrt time can be employed
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Observation
Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]
Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift
Synonyms are different words with similar meanings
In our context synonyms are terms used as name variants(other names titles or roles) of a named entity
Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Observation
Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]
Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift
Synonyms are different words with similar meanings
In our context synonyms are terms used as name variants(other names titles or roles) of a named entity
Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Observation
Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]
Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift
Synonyms are different words with similar meanings
In our context synonyms are terms used as name variants(other names titles or roles) of a named entity
Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
What are time-based synonyms
Time-independent synonyms are invariant to time
Time-dependent synonyms are relevant to a particular timeperiod ie entity-synonym relationships change over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Scenario 1Query ldquoPope Benedict XVIrdquo and written before 2005Documents about ldquoJoseph Alois Ratzingerrdquo are relevant
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Scenario 2Query ldquoHillary R Clintonrdquo and written from 1997 to 2002Documents about ldquoNew York Senatorrdquo and ldquoFirst Ladyof the United Statesrdquo are relevant
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Challenge
Semantic gaps in searching archives or a lack of knowledgeabout a query and synonyms at particular time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Figure A snapshot of Wikipediaand current revisions at time tk
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Example[Bunescu and Pasca EACLrsquo2006]1) Multi-word titles and all words arecapitalized
President_of_the_United_StatesrArr named entity
2) Single-word titles with multiple capitalletters
UNICEF and WHO are namedentities
3) 75 of occurrences in the article textitself are capitalized
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Example
ei President_of_the_United_States
tk 112001
sj ldquoGeorge W Bushrdquo
ξi (ei sj ) or(President_of_the_United_StatesldquoGeorge W Bushrdquo)
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links
Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm
Example[[President_of_the_United_
States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_
States
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links
Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm
Example[[President_of_the_United_
States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_
States
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
OutputEntity-synonym relationships and time periods
Named Entity Synonym Time Period
Pope Benedict XVICardinal Joseph Ratzinger 052005 - 032009Joseph Ratzinger 052005 - 032009Pope Benedict XVI 052005 - 032009
Barack ObamaBarack Hussein Obama II 022007 - 032009Sen Barack Obama 072007 - 032009Senator Barack Obama 052006 - 032009
Hillary Rodham ClintonHillary Clinton 082003 - 032009Sen Hillary Clinton 032007 - 032009Senator Clinton 112007 - 032009
The time of synonyms are timestamps of Wikipedia articles (8 years) in which they
appear not temporal expression extracted from the contents
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time
18M articles from January 1987 to June 2007 (20 years)
Use the burst detection algorithm [Kleinberg in KDDrsquo2002]
Generate bursty periods of ξij by computing a rate of occurrencefrom document streams
Output bursty intervals and bursty weight ie periods ofoccurrence and intensity
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
OutputResults from burst-detection algorithm
Synonym Entity Burst Weight TimeStart End
President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993
Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Classifying synonyms into two types
DefinitionClass A time-independent
Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)
Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo
Class B time-dependent
Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered
Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Overview of experiments
Our experimental evaluation is divided into three main parts
1 Extracting and improving the accuracy of time of synonyms
2 Query expansion using time-independent synonyms
3 Query expansion using time-dependent synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonymsData collection
The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)
New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007
ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725
Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1
Measurement Accuracy
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
Data collection
TREC Robust Track (2004)
250 topics (topics 301-450 and topics 601-700)
Tools
Terrier ndash an open source search engine developed by University of Glasgow
BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting
Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight
qexp = qorg s1andw1 s2
andw2 skandwk
Measurement Mean Average Precision (MAP) R-precision and Recall
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonymsData collection
NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications
Select 20 strongly time-dependent queries
Measurement Precision at 10 20 and 30 retrieved documents
Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period
American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonyms
Statistics and accuracy of entity-synonym relationships extracted from Wikipedia
NER Method NE NE-Syn Avg Syn Accuracyper NE ()
BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73
BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2
BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo
Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Robust2004 query statistics
Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)
2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)
k = 2 if k gt 2 bring noise to the NERQ process
Number of queries using two different NERType MW-NERQ MRW-NERQ
Named entity 42 149Not named entity 208 101Total 250 250
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ
MAP R-precision Recall MAP R-precision Recall
PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504
PM Probabilistic Model without query expansion
PRF Pseudo Relevance Feedback using Rocchio algorithm
SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback
SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback
Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonyms
P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800
TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
QUEST Query Expansion using Synonyms over Time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest
Thank you
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
- Outline
- Main Talk
-
- Introduction
-
- Problem Statement
- Contributions
-
- Synonym Detection
-
- Entity Recognition and Synonym Extraction
- Improving the Accuracy of Time
-
- Query Expansion
-
- Time-based Synonyms
- Ranking Time-independent Synonyms
- Ranking Time-dependent Synonyms
-
- Evaluation
-
- Experiment Setting
- Experimental Results
-
- Conclusions
-
- Conclusions and Future Work
-
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Problem statement
In recent years document archives are publicly availableEg Internet Archive digital libraries and news archives
Searching in such resources is not straightforwardContents in these resources are strongly time-dependent
Query ldquoPope Benedict XVIrdquo and dates ldquobefore 2005rdquoUnable to retrieve documents about ldquoJoseph AloisRatzingerrdquoTo improve the retrieval effectiveness query expansionusing synonyms wrt time can be employed
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Observation
Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]
Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift
Synonyms are different words with similar meanings
In our context synonyms are terms used as name variants(other names titles or roles) of a named entity
Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Observation
Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]
Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift
Synonyms are different words with similar meanings
In our context synonyms are terms used as name variants(other names titles or roles) of a named entity
Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Observation
Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]
Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift
Synonyms are different words with similar meanings
In our context synonyms are terms used as name variants(other names titles or roles) of a named entity
Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
What are time-based synonyms
Time-independent synonyms are invariant to time
Time-dependent synonyms are relevant to a particular timeperiod ie entity-synonym relationships change over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Scenario 1Query ldquoPope Benedict XVIrdquo and written before 2005Documents about ldquoJoseph Alois Ratzingerrdquo are relevant
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Scenario 2Query ldquoHillary R Clintonrdquo and written from 1997 to 2002Documents about ldquoNew York Senatorrdquo and ldquoFirst Ladyof the United Statesrdquo are relevant
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Challenge
Semantic gaps in searching archives or a lack of knowledgeabout a query and synonyms at particular time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Figure A snapshot of Wikipediaand current revisions at time tk
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Example[Bunescu and Pasca EACLrsquo2006]1) Multi-word titles and all words arecapitalized
President_of_the_United_StatesrArr named entity
2) Single-word titles with multiple capitalletters
UNICEF and WHO are namedentities
3) 75 of occurrences in the article textitself are capitalized
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Example
ei President_of_the_United_States
tk 112001
sj ldquoGeorge W Bushrdquo
ξi (ei sj ) or(President_of_the_United_StatesldquoGeorge W Bushrdquo)
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links
Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm
Example[[President_of_the_United_
States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_
States
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links
Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm
Example[[President_of_the_United_
States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_
States
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
OutputEntity-synonym relationships and time periods
Named Entity Synonym Time Period
Pope Benedict XVICardinal Joseph Ratzinger 052005 - 032009Joseph Ratzinger 052005 - 032009Pope Benedict XVI 052005 - 032009
Barack ObamaBarack Hussein Obama II 022007 - 032009Sen Barack Obama 072007 - 032009Senator Barack Obama 052006 - 032009
Hillary Rodham ClintonHillary Clinton 082003 - 032009Sen Hillary Clinton 032007 - 032009Senator Clinton 112007 - 032009
The time of synonyms are timestamps of Wikipedia articles (8 years) in which they
appear not temporal expression extracted from the contents
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time
18M articles from January 1987 to June 2007 (20 years)
Use the burst detection algorithm [Kleinberg in KDDrsquo2002]
Generate bursty periods of ξij by computing a rate of occurrencefrom document streams
Output bursty intervals and bursty weight ie periods ofoccurrence and intensity
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
OutputResults from burst-detection algorithm
Synonym Entity Burst Weight TimeStart End
President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993
Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Classifying synonyms into two types
DefinitionClass A time-independent
Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)
Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo
Class B time-dependent
Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered
Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Overview of experiments
Our experimental evaluation is divided into three main parts
1 Extracting and improving the accuracy of time of synonyms
2 Query expansion using time-independent synonyms
3 Query expansion using time-dependent synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonymsData collection
The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)
New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007
ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725
Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1
Measurement Accuracy
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
Data collection
TREC Robust Track (2004)
250 topics (topics 301-450 and topics 601-700)
Tools
Terrier ndash an open source search engine developed by University of Glasgow
BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting
Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight
qexp = qorg s1andw1 s2
andw2 skandwk
Measurement Mean Average Precision (MAP) R-precision and Recall
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonymsData collection
NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications
Select 20 strongly time-dependent queries
Measurement Precision at 10 20 and 30 retrieved documents
Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period
American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonyms
Statistics and accuracy of entity-synonym relationships extracted from Wikipedia
NER Method NE NE-Syn Avg Syn Accuracyper NE ()
BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73
BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2
BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo
Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Robust2004 query statistics
Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)
2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)
k = 2 if k gt 2 bring noise to the NERQ process
Number of queries using two different NERType MW-NERQ MRW-NERQ
Named entity 42 149Not named entity 208 101Total 250 250
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ
MAP R-precision Recall MAP R-precision Recall
PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504
PM Probabilistic Model without query expansion
PRF Pseudo Relevance Feedback using Rocchio algorithm
SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback
SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback
Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonyms
P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800
TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
QUEST Query Expansion using Synonyms over Time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest
Thank you
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
- Outline
- Main Talk
-
- Introduction
-
- Problem Statement
- Contributions
-
- Synonym Detection
-
- Entity Recognition and Synonym Extraction
- Improving the Accuracy of Time
-
- Query Expansion
-
- Time-based Synonyms
- Ranking Time-independent Synonyms
- Ranking Time-dependent Synonyms
-
- Evaluation
-
- Experiment Setting
- Experimental Results
-
- Conclusions
-
- Conclusions and Future Work
-
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Observation
Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]
Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift
Synonyms are different words with similar meanings
In our context synonyms are terms used as name variants(other names titles or roles) of a named entity
Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Observation
Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]
Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift
Synonyms are different words with similar meanings
In our context synonyms are terms used as name variants(other names titles or roles) of a named entity
Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Observation
Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]
Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift
Synonyms are different words with similar meanings
In our context synonyms are terms used as name variants(other names titles or roles) of a named entity
Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
What are time-based synonyms
Time-independent synonyms are invariant to time
Time-dependent synonyms are relevant to a particular timeperiod ie entity-synonym relationships change over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Scenario 1Query ldquoPope Benedict XVIrdquo and written before 2005Documents about ldquoJoseph Alois Ratzingerrdquo are relevant
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Scenario 2Query ldquoHillary R Clintonrdquo and written from 1997 to 2002Documents about ldquoNew York Senatorrdquo and ldquoFirst Ladyof the United Statesrdquo are relevant
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Challenge
Semantic gaps in searching archives or a lack of knowledgeabout a query and synonyms at particular time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Figure A snapshot of Wikipediaand current revisions at time tk
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Example[Bunescu and Pasca EACLrsquo2006]1) Multi-word titles and all words arecapitalized
President_of_the_United_StatesrArr named entity
2) Single-word titles with multiple capitalletters
UNICEF and WHO are namedentities
3) 75 of occurrences in the article textitself are capitalized
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Example
ei President_of_the_United_States
tk 112001
sj ldquoGeorge W Bushrdquo
ξi (ei sj ) or(President_of_the_United_StatesldquoGeorge W Bushrdquo)
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links
Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm
Example[[President_of_the_United_
States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_
States
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links
Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm
Example[[President_of_the_United_
States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_
States
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
OutputEntity-synonym relationships and time periods
Named Entity Synonym Time Period
Pope Benedict XVICardinal Joseph Ratzinger 052005 - 032009Joseph Ratzinger 052005 - 032009Pope Benedict XVI 052005 - 032009
Barack ObamaBarack Hussein Obama II 022007 - 032009Sen Barack Obama 072007 - 032009Senator Barack Obama 052006 - 032009
Hillary Rodham ClintonHillary Clinton 082003 - 032009Sen Hillary Clinton 032007 - 032009Senator Clinton 112007 - 032009
The time of synonyms are timestamps of Wikipedia articles (8 years) in which they
appear not temporal expression extracted from the contents
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time
18M articles from January 1987 to June 2007 (20 years)
Use the burst detection algorithm [Kleinberg in KDDrsquo2002]
Generate bursty periods of ξij by computing a rate of occurrencefrom document streams
Output bursty intervals and bursty weight ie periods ofoccurrence and intensity
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
OutputResults from burst-detection algorithm
Synonym Entity Burst Weight TimeStart End
President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993
Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Classifying synonyms into two types
DefinitionClass A time-independent
Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)
Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo
Class B time-dependent
Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered
Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Overview of experiments
Our experimental evaluation is divided into three main parts
1 Extracting and improving the accuracy of time of synonyms
2 Query expansion using time-independent synonyms
3 Query expansion using time-dependent synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonymsData collection
The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)
New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007
ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725
Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1
Measurement Accuracy
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
Data collection
TREC Robust Track (2004)
250 topics (topics 301-450 and topics 601-700)
Tools
Terrier ndash an open source search engine developed by University of Glasgow
BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting
Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight
qexp = qorg s1andw1 s2
andw2 skandwk
Measurement Mean Average Precision (MAP) R-precision and Recall
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonymsData collection
NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications
Select 20 strongly time-dependent queries
Measurement Precision at 10 20 and 30 retrieved documents
Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period
American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonyms
Statistics and accuracy of entity-synonym relationships extracted from Wikipedia
NER Method NE NE-Syn Avg Syn Accuracyper NE ()
BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73
BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2
BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo
Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Robust2004 query statistics
Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)
2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)
k = 2 if k gt 2 bring noise to the NERQ process
Number of queries using two different NERType MW-NERQ MRW-NERQ
Named entity 42 149Not named entity 208 101Total 250 250
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ
MAP R-precision Recall MAP R-precision Recall
PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504
PM Probabilistic Model without query expansion
PRF Pseudo Relevance Feedback using Rocchio algorithm
SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback
SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback
Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonyms
P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800
TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
QUEST Query Expansion using Synonyms over Time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest
Thank you
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
- Outline
- Main Talk
-
- Introduction
-
- Problem Statement
- Contributions
-
- Synonym Detection
-
- Entity Recognition and Synonym Extraction
- Improving the Accuracy of Time
-
- Query Expansion
-
- Time-based Synonyms
- Ranking Time-independent Synonyms
- Ranking Time-dependent Synonyms
-
- Evaluation
-
- Experiment Setting
- Experimental Results
-
- Conclusions
-
- Conclusions and Future Work
-
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Observation
Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]
Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift
Synonyms are different words with similar meanings
In our context synonyms are terms used as name variants(other names titles or roles) of a named entity
Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Observation
Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]
Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift
Synonyms are different words with similar meanings
In our context synonyms are terms used as name variants(other names titles or roles) of a named entity
Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
What are time-based synonyms
Time-independent synonyms are invariant to time
Time-dependent synonyms are relevant to a particular timeperiod ie entity-synonym relationships change over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Scenario 1Query ldquoPope Benedict XVIrdquo and written before 2005Documents about ldquoJoseph Alois Ratzingerrdquo are relevant
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Scenario 2Query ldquoHillary R Clintonrdquo and written from 1997 to 2002Documents about ldquoNew York Senatorrdquo and ldquoFirst Ladyof the United Statesrdquo are relevant
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Challenge
Semantic gaps in searching archives or a lack of knowledgeabout a query and synonyms at particular time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Figure A snapshot of Wikipediaand current revisions at time tk
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Example[Bunescu and Pasca EACLrsquo2006]1) Multi-word titles and all words arecapitalized
President_of_the_United_StatesrArr named entity
2) Single-word titles with multiple capitalletters
UNICEF and WHO are namedentities
3) 75 of occurrences in the article textitself are capitalized
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Example
ei President_of_the_United_States
tk 112001
sj ldquoGeorge W Bushrdquo
ξi (ei sj ) or(President_of_the_United_StatesldquoGeorge W Bushrdquo)
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links
Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm
Example[[President_of_the_United_
States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_
States
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links
Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm
Example[[President_of_the_United_
States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_
States
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
OutputEntity-synonym relationships and time periods
Named Entity Synonym Time Period
Pope Benedict XVICardinal Joseph Ratzinger 052005 - 032009Joseph Ratzinger 052005 - 032009Pope Benedict XVI 052005 - 032009
Barack ObamaBarack Hussein Obama II 022007 - 032009Sen Barack Obama 072007 - 032009Senator Barack Obama 052006 - 032009
Hillary Rodham ClintonHillary Clinton 082003 - 032009Sen Hillary Clinton 032007 - 032009Senator Clinton 112007 - 032009
The time of synonyms are timestamps of Wikipedia articles (8 years) in which they
appear not temporal expression extracted from the contents
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time
18M articles from January 1987 to June 2007 (20 years)
Use the burst detection algorithm [Kleinberg in KDDrsquo2002]
Generate bursty periods of ξij by computing a rate of occurrencefrom document streams
Output bursty intervals and bursty weight ie periods ofoccurrence and intensity
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
OutputResults from burst-detection algorithm
Synonym Entity Burst Weight TimeStart End
President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993
Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Classifying synonyms into two types
DefinitionClass A time-independent
Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)
Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo
Class B time-dependent
Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered
Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Overview of experiments
Our experimental evaluation is divided into three main parts
1 Extracting and improving the accuracy of time of synonyms
2 Query expansion using time-independent synonyms
3 Query expansion using time-dependent synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonymsData collection
The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)
New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007
ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725
Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1
Measurement Accuracy
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
Data collection
TREC Robust Track (2004)
250 topics (topics 301-450 and topics 601-700)
Tools
Terrier ndash an open source search engine developed by University of Glasgow
BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting
Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight
qexp = qorg s1andw1 s2
andw2 skandwk
Measurement Mean Average Precision (MAP) R-precision and Recall
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonymsData collection
NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications
Select 20 strongly time-dependent queries
Measurement Precision at 10 20 and 30 retrieved documents
Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period
American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonyms
Statistics and accuracy of entity-synonym relationships extracted from Wikipedia
NER Method NE NE-Syn Avg Syn Accuracyper NE ()
BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73
BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2
BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo
Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Robust2004 query statistics
Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)
2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)
k = 2 if k gt 2 bring noise to the NERQ process
Number of queries using two different NERType MW-NERQ MRW-NERQ
Named entity 42 149Not named entity 208 101Total 250 250
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ
MAP R-precision Recall MAP R-precision Recall
PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504
PM Probabilistic Model without query expansion
PRF Pseudo Relevance Feedback using Rocchio algorithm
SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback
SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback
Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonyms
P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800
TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
QUEST Query Expansion using Synonyms over Time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest
Thank you
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
- Outline
- Main Talk
-
- Introduction
-
- Problem Statement
- Contributions
-
- Synonym Detection
-
- Entity Recognition and Synonym Extraction
- Improving the Accuracy of Time
-
- Query Expansion
-
- Time-based Synonyms
- Ranking Time-independent Synonyms
- Ranking Time-dependent Synonyms
-
- Evaluation
-
- Experiment Setting
- Experimental Results
-
- Conclusions
-
- Conclusions and Future Work
-
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Observation
Named entities (people organization location etc)constitute a major fraction of queries [Sanderson SIGIRrsquo2008]
Very dynamic in appearance ie relationships betweenterms changes over timeEg changes of roles name alterations or semantic shift
Synonyms are different words with similar meanings
In our context synonyms are terms used as name variants(other names titles or roles) of a named entity
Eg ldquoCardinal Joseph Ratzingerrdquo is a synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
What are time-based synonyms
Time-independent synonyms are invariant to time
Time-dependent synonyms are relevant to a particular timeperiod ie entity-synonym relationships change over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Scenario 1Query ldquoPope Benedict XVIrdquo and written before 2005Documents about ldquoJoseph Alois Ratzingerrdquo are relevant
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Scenario 2Query ldquoHillary R Clintonrdquo and written from 1997 to 2002Documents about ldquoNew York Senatorrdquo and ldquoFirst Ladyof the United Statesrdquo are relevant
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Challenge
Semantic gaps in searching archives or a lack of knowledgeabout a query and synonyms at particular time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Figure A snapshot of Wikipediaand current revisions at time tk
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Example[Bunescu and Pasca EACLrsquo2006]1) Multi-word titles and all words arecapitalized
President_of_the_United_StatesrArr named entity
2) Single-word titles with multiple capitalletters
UNICEF and WHO are namedentities
3) 75 of occurrences in the article textitself are capitalized
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Example
ei President_of_the_United_States
tk 112001
sj ldquoGeorge W Bushrdquo
ξi (ei sj ) or(President_of_the_United_StatesldquoGeorge W Bushrdquo)
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links
Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm
Example[[President_of_the_United_
States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_
States
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links
Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm
Example[[President_of_the_United_
States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_
States
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
OutputEntity-synonym relationships and time periods
Named Entity Synonym Time Period
Pope Benedict XVICardinal Joseph Ratzinger 052005 - 032009Joseph Ratzinger 052005 - 032009Pope Benedict XVI 052005 - 032009
Barack ObamaBarack Hussein Obama II 022007 - 032009Sen Barack Obama 072007 - 032009Senator Barack Obama 052006 - 032009
Hillary Rodham ClintonHillary Clinton 082003 - 032009Sen Hillary Clinton 032007 - 032009Senator Clinton 112007 - 032009
The time of synonyms are timestamps of Wikipedia articles (8 years) in which they
appear not temporal expression extracted from the contents
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time
18M articles from January 1987 to June 2007 (20 years)
Use the burst detection algorithm [Kleinberg in KDDrsquo2002]
Generate bursty periods of ξij by computing a rate of occurrencefrom document streams
Output bursty intervals and bursty weight ie periods ofoccurrence and intensity
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
OutputResults from burst-detection algorithm
Synonym Entity Burst Weight TimeStart End
President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993
Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Classifying synonyms into two types
DefinitionClass A time-independent
Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)
Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo
Class B time-dependent
Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered
Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Overview of experiments
Our experimental evaluation is divided into three main parts
1 Extracting and improving the accuracy of time of synonyms
2 Query expansion using time-independent synonyms
3 Query expansion using time-dependent synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonymsData collection
The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)
New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007
ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725
Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1
Measurement Accuracy
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
Data collection
TREC Robust Track (2004)
250 topics (topics 301-450 and topics 601-700)
Tools
Terrier ndash an open source search engine developed by University of Glasgow
BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting
Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight
qexp = qorg s1andw1 s2
andw2 skandwk
Measurement Mean Average Precision (MAP) R-precision and Recall
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonymsData collection
NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications
Select 20 strongly time-dependent queries
Measurement Precision at 10 20 and 30 retrieved documents
Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period
American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonyms
Statistics and accuracy of entity-synonym relationships extracted from Wikipedia
NER Method NE NE-Syn Avg Syn Accuracyper NE ()
BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73
BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2
BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo
Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Robust2004 query statistics
Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)
2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)
k = 2 if k gt 2 bring noise to the NERQ process
Number of queries using two different NERType MW-NERQ MRW-NERQ
Named entity 42 149Not named entity 208 101Total 250 250
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ
MAP R-precision Recall MAP R-precision Recall
PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504
PM Probabilistic Model without query expansion
PRF Pseudo Relevance Feedback using Rocchio algorithm
SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback
SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback
Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonyms
P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800
TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
QUEST Query Expansion using Synonyms over Time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest
Thank you
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
- Outline
- Main Talk
-
- Introduction
-
- Problem Statement
- Contributions
-
- Synonym Detection
-
- Entity Recognition and Synonym Extraction
- Improving the Accuracy of Time
-
- Query Expansion
-
- Time-based Synonyms
- Ranking Time-independent Synonyms
- Ranking Time-dependent Synonyms
-
- Evaluation
-
- Experiment Setting
- Experimental Results
-
- Conclusions
-
- Conclusions and Future Work
-
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
What are time-based synonyms
Time-independent synonyms are invariant to time
Time-dependent synonyms are relevant to a particular timeperiod ie entity-synonym relationships change over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Scenario 1Query ldquoPope Benedict XVIrdquo and written before 2005Documents about ldquoJoseph Alois Ratzingerrdquo are relevant
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Scenario 2Query ldquoHillary R Clintonrdquo and written from 1997 to 2002Documents about ldquoNew York Senatorrdquo and ldquoFirst Ladyof the United Statesrdquo are relevant
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Challenge
Semantic gaps in searching archives or a lack of knowledgeabout a query and synonyms at particular time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Figure A snapshot of Wikipediaand current revisions at time tk
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Example[Bunescu and Pasca EACLrsquo2006]1) Multi-word titles and all words arecapitalized
President_of_the_United_StatesrArr named entity
2) Single-word titles with multiple capitalletters
UNICEF and WHO are namedentities
3) 75 of occurrences in the article textitself are capitalized
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Example
ei President_of_the_United_States
tk 112001
sj ldquoGeorge W Bushrdquo
ξi (ei sj ) or(President_of_the_United_StatesldquoGeorge W Bushrdquo)
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links
Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm
Example[[President_of_the_United_
States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_
States
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links
Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm
Example[[President_of_the_United_
States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_
States
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
OutputEntity-synonym relationships and time periods
Named Entity Synonym Time Period
Pope Benedict XVICardinal Joseph Ratzinger 052005 - 032009Joseph Ratzinger 052005 - 032009Pope Benedict XVI 052005 - 032009
Barack ObamaBarack Hussein Obama II 022007 - 032009Sen Barack Obama 072007 - 032009Senator Barack Obama 052006 - 032009
Hillary Rodham ClintonHillary Clinton 082003 - 032009Sen Hillary Clinton 032007 - 032009Senator Clinton 112007 - 032009
The time of synonyms are timestamps of Wikipedia articles (8 years) in which they
appear not temporal expression extracted from the contents
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time
18M articles from January 1987 to June 2007 (20 years)
Use the burst detection algorithm [Kleinberg in KDDrsquo2002]
Generate bursty periods of ξij by computing a rate of occurrencefrom document streams
Output bursty intervals and bursty weight ie periods ofoccurrence and intensity
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
OutputResults from burst-detection algorithm
Synonym Entity Burst Weight TimeStart End
President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993
Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Classifying synonyms into two types
DefinitionClass A time-independent
Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)
Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo
Class B time-dependent
Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered
Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Overview of experiments
Our experimental evaluation is divided into three main parts
1 Extracting and improving the accuracy of time of synonyms
2 Query expansion using time-independent synonyms
3 Query expansion using time-dependent synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonymsData collection
The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)
New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007
ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725
Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1
Measurement Accuracy
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
Data collection
TREC Robust Track (2004)
250 topics (topics 301-450 and topics 601-700)
Tools
Terrier ndash an open source search engine developed by University of Glasgow
BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting
Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight
qexp = qorg s1andw1 s2
andw2 skandwk
Measurement Mean Average Precision (MAP) R-precision and Recall
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonymsData collection
NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications
Select 20 strongly time-dependent queries
Measurement Precision at 10 20 and 30 retrieved documents
Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period
American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonyms
Statistics and accuracy of entity-synonym relationships extracted from Wikipedia
NER Method NE NE-Syn Avg Syn Accuracyper NE ()
BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73
BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2
BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo
Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Robust2004 query statistics
Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)
2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)
k = 2 if k gt 2 bring noise to the NERQ process
Number of queries using two different NERType MW-NERQ MRW-NERQ
Named entity 42 149Not named entity 208 101Total 250 250
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ
MAP R-precision Recall MAP R-precision Recall
PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504
PM Probabilistic Model without query expansion
PRF Pseudo Relevance Feedback using Rocchio algorithm
SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback
SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback
Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonyms
P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800
TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
QUEST Query Expansion using Synonyms over Time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest
Thank you
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
- Outline
- Main Talk
-
- Introduction
-
- Problem Statement
- Contributions
-
- Synonym Detection
-
- Entity Recognition and Synonym Extraction
- Improving the Accuracy of Time
-
- Query Expansion
-
- Time-based Synonyms
- Ranking Time-independent Synonyms
- Ranking Time-dependent Synonyms
-
- Evaluation
-
- Experiment Setting
- Experimental Results
-
- Conclusions
-
- Conclusions and Future Work
-
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Scenario 1Query ldquoPope Benedict XVIrdquo and written before 2005Documents about ldquoJoseph Alois Ratzingerrdquo are relevant
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Scenario 2Query ldquoHillary R Clintonrdquo and written from 1997 to 2002Documents about ldquoNew York Senatorrdquo and ldquoFirst Ladyof the United Statesrdquo are relevant
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Challenge
Semantic gaps in searching archives or a lack of knowledgeabout a query and synonyms at particular time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Figure A snapshot of Wikipediaand current revisions at time tk
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Example[Bunescu and Pasca EACLrsquo2006]1) Multi-word titles and all words arecapitalized
President_of_the_United_StatesrArr named entity
2) Single-word titles with multiple capitalletters
UNICEF and WHO are namedentities
3) 75 of occurrences in the article textitself are capitalized
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Example
ei President_of_the_United_States
tk 112001
sj ldquoGeorge W Bushrdquo
ξi (ei sj ) or(President_of_the_United_StatesldquoGeorge W Bushrdquo)
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links
Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm
Example[[President_of_the_United_
States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_
States
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links
Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm
Example[[President_of_the_United_
States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_
States
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
OutputEntity-synonym relationships and time periods
Named Entity Synonym Time Period
Pope Benedict XVICardinal Joseph Ratzinger 052005 - 032009Joseph Ratzinger 052005 - 032009Pope Benedict XVI 052005 - 032009
Barack ObamaBarack Hussein Obama II 022007 - 032009Sen Barack Obama 072007 - 032009Senator Barack Obama 052006 - 032009
Hillary Rodham ClintonHillary Clinton 082003 - 032009Sen Hillary Clinton 032007 - 032009Senator Clinton 112007 - 032009
The time of synonyms are timestamps of Wikipedia articles (8 years) in which they
appear not temporal expression extracted from the contents
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time
18M articles from January 1987 to June 2007 (20 years)
Use the burst detection algorithm [Kleinberg in KDDrsquo2002]
Generate bursty periods of ξij by computing a rate of occurrencefrom document streams
Output bursty intervals and bursty weight ie periods ofoccurrence and intensity
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
OutputResults from burst-detection algorithm
Synonym Entity Burst Weight TimeStart End
President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993
Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Classifying synonyms into two types
DefinitionClass A time-independent
Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)
Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo
Class B time-dependent
Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered
Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Overview of experiments
Our experimental evaluation is divided into three main parts
1 Extracting and improving the accuracy of time of synonyms
2 Query expansion using time-independent synonyms
3 Query expansion using time-dependent synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonymsData collection
The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)
New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007
ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725
Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1
Measurement Accuracy
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
Data collection
TREC Robust Track (2004)
250 topics (topics 301-450 and topics 601-700)
Tools
Terrier ndash an open source search engine developed by University of Glasgow
BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting
Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight
qexp = qorg s1andw1 s2
andw2 skandwk
Measurement Mean Average Precision (MAP) R-precision and Recall
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonymsData collection
NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications
Select 20 strongly time-dependent queries
Measurement Precision at 10 20 and 30 retrieved documents
Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period
American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonyms
Statistics and accuracy of entity-synonym relationships extracted from Wikipedia
NER Method NE NE-Syn Avg Syn Accuracyper NE ()
BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73
BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2
BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo
Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Robust2004 query statistics
Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)
2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)
k = 2 if k gt 2 bring noise to the NERQ process
Number of queries using two different NERType MW-NERQ MRW-NERQ
Named entity 42 149Not named entity 208 101Total 250 250
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ
MAP R-precision Recall MAP R-precision Recall
PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504
PM Probabilistic Model without query expansion
PRF Pseudo Relevance Feedback using Rocchio algorithm
SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback
SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback
Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonyms
P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800
TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
QUEST Query Expansion using Synonyms over Time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest
Thank you
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
- Outline
- Main Talk
-
- Introduction
-
- Problem Statement
- Contributions
-
- Synonym Detection
-
- Entity Recognition and Synonym Extraction
- Improving the Accuracy of Time
-
- Query Expansion
-
- Time-based Synonyms
- Ranking Time-independent Synonyms
- Ranking Time-dependent Synonyms
-
- Evaluation
-
- Experiment Setting
- Experimental Results
-
- Conclusions
-
- Conclusions and Future Work
-
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Scenario 1Query ldquoPope Benedict XVIrdquo and written before 2005Documents about ldquoJoseph Alois Ratzingerrdquo are relevant
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Scenario 2Query ldquoHillary R Clintonrdquo and written from 1997 to 2002Documents about ldquoNew York Senatorrdquo and ldquoFirst Ladyof the United Statesrdquo are relevant
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Challenge
Semantic gaps in searching archives or a lack of knowledgeabout a query and synonyms at particular time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Figure A snapshot of Wikipediaand current revisions at time tk
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Example[Bunescu and Pasca EACLrsquo2006]1) Multi-word titles and all words arecapitalized
President_of_the_United_StatesrArr named entity
2) Single-word titles with multiple capitalletters
UNICEF and WHO are namedentities
3) 75 of occurrences in the article textitself are capitalized
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Example
ei President_of_the_United_States
tk 112001
sj ldquoGeorge W Bushrdquo
ξi (ei sj ) or(President_of_the_United_StatesldquoGeorge W Bushrdquo)
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links
Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm
Example[[President_of_the_United_
States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_
States
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links
Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm
Example[[President_of_the_United_
States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_
States
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
OutputEntity-synonym relationships and time periods
Named Entity Synonym Time Period
Pope Benedict XVICardinal Joseph Ratzinger 052005 - 032009Joseph Ratzinger 052005 - 032009Pope Benedict XVI 052005 - 032009
Barack ObamaBarack Hussein Obama II 022007 - 032009Sen Barack Obama 072007 - 032009Senator Barack Obama 052006 - 032009
Hillary Rodham ClintonHillary Clinton 082003 - 032009Sen Hillary Clinton 032007 - 032009Senator Clinton 112007 - 032009
The time of synonyms are timestamps of Wikipedia articles (8 years) in which they
appear not temporal expression extracted from the contents
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time
18M articles from January 1987 to June 2007 (20 years)
Use the burst detection algorithm [Kleinberg in KDDrsquo2002]
Generate bursty periods of ξij by computing a rate of occurrencefrom document streams
Output bursty intervals and bursty weight ie periods ofoccurrence and intensity
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
OutputResults from burst-detection algorithm
Synonym Entity Burst Weight TimeStart End
President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993
Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Classifying synonyms into two types
DefinitionClass A time-independent
Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)
Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo
Class B time-dependent
Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered
Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Overview of experiments
Our experimental evaluation is divided into three main parts
1 Extracting and improving the accuracy of time of synonyms
2 Query expansion using time-independent synonyms
3 Query expansion using time-dependent synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonymsData collection
The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)
New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007
ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725
Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1
Measurement Accuracy
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
Data collection
TREC Robust Track (2004)
250 topics (topics 301-450 and topics 601-700)
Tools
Terrier ndash an open source search engine developed by University of Glasgow
BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting
Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight
qexp = qorg s1andw1 s2
andw2 skandwk
Measurement Mean Average Precision (MAP) R-precision and Recall
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonymsData collection
NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications
Select 20 strongly time-dependent queries
Measurement Precision at 10 20 and 30 retrieved documents
Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period
American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonyms
Statistics and accuracy of entity-synonym relationships extracted from Wikipedia
NER Method NE NE-Syn Avg Syn Accuracyper NE ()
BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73
BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2
BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo
Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Robust2004 query statistics
Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)
2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)
k = 2 if k gt 2 bring noise to the NERQ process
Number of queries using two different NERType MW-NERQ MRW-NERQ
Named entity 42 149Not named entity 208 101Total 250 250
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ
MAP R-precision Recall MAP R-precision Recall
PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504
PM Probabilistic Model without query expansion
PRF Pseudo Relevance Feedback using Rocchio algorithm
SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback
SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback
Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonyms
P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800
TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
QUEST Query Expansion using Synonyms over Time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest
Thank you
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
- Outline
- Main Talk
-
- Introduction
-
- Problem Statement
- Contributions
-
- Synonym Detection
-
- Entity Recognition and Synonym Extraction
- Improving the Accuracy of Time
-
- Query Expansion
-
- Time-based Synonyms
- Ranking Time-independent Synonyms
- Ranking Time-dependent Synonyms
-
- Evaluation
-
- Experiment Setting
- Experimental Results
-
- Conclusions
-
- Conclusions and Future Work
-
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Scenario 2Query ldquoHillary R Clintonrdquo and written from 1997 to 2002Documents about ldquoNew York Senatorrdquo and ldquoFirst Ladyof the United Statesrdquo are relevant
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Challenge
Semantic gaps in searching archives or a lack of knowledgeabout a query and synonyms at particular time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Figure A snapshot of Wikipediaand current revisions at time tk
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Example[Bunescu and Pasca EACLrsquo2006]1) Multi-word titles and all words arecapitalized
President_of_the_United_StatesrArr named entity
2) Single-word titles with multiple capitalletters
UNICEF and WHO are namedentities
3) 75 of occurrences in the article textitself are capitalized
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Example
ei President_of_the_United_States
tk 112001
sj ldquoGeorge W Bushrdquo
ξi (ei sj ) or(President_of_the_United_StatesldquoGeorge W Bushrdquo)
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links
Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm
Example[[President_of_the_United_
States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_
States
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links
Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm
Example[[President_of_the_United_
States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_
States
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
OutputEntity-synonym relationships and time periods
Named Entity Synonym Time Period
Pope Benedict XVICardinal Joseph Ratzinger 052005 - 032009Joseph Ratzinger 052005 - 032009Pope Benedict XVI 052005 - 032009
Barack ObamaBarack Hussein Obama II 022007 - 032009Sen Barack Obama 072007 - 032009Senator Barack Obama 052006 - 032009
Hillary Rodham ClintonHillary Clinton 082003 - 032009Sen Hillary Clinton 032007 - 032009Senator Clinton 112007 - 032009
The time of synonyms are timestamps of Wikipedia articles (8 years) in which they
appear not temporal expression extracted from the contents
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time
18M articles from January 1987 to June 2007 (20 years)
Use the burst detection algorithm [Kleinberg in KDDrsquo2002]
Generate bursty periods of ξij by computing a rate of occurrencefrom document streams
Output bursty intervals and bursty weight ie periods ofoccurrence and intensity
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
OutputResults from burst-detection algorithm
Synonym Entity Burst Weight TimeStart End
President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993
Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Classifying synonyms into two types
DefinitionClass A time-independent
Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)
Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo
Class B time-dependent
Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered
Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Overview of experiments
Our experimental evaluation is divided into three main parts
1 Extracting and improving the accuracy of time of synonyms
2 Query expansion using time-independent synonyms
3 Query expansion using time-dependent synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonymsData collection
The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)
New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007
ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725
Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1
Measurement Accuracy
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
Data collection
TREC Robust Track (2004)
250 topics (topics 301-450 and topics 601-700)
Tools
Terrier ndash an open source search engine developed by University of Glasgow
BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting
Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight
qexp = qorg s1andw1 s2
andw2 skandwk
Measurement Mean Average Precision (MAP) R-precision and Recall
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonymsData collection
NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications
Select 20 strongly time-dependent queries
Measurement Precision at 10 20 and 30 retrieved documents
Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period
American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonyms
Statistics and accuracy of entity-synonym relationships extracted from Wikipedia
NER Method NE NE-Syn Avg Syn Accuracyper NE ()
BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73
BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2
BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo
Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Robust2004 query statistics
Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)
2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)
k = 2 if k gt 2 bring noise to the NERQ process
Number of queries using two different NERType MW-NERQ MRW-NERQ
Named entity 42 149Not named entity 208 101Total 250 250
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ
MAP R-precision Recall MAP R-precision Recall
PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504
PM Probabilistic Model without query expansion
PRF Pseudo Relevance Feedback using Rocchio algorithm
SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback
SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback
Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonyms
P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800
TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
QUEST Query Expansion using Synonyms over Time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest
Thank you
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
- Outline
- Main Talk
-
- Introduction
-
- Problem Statement
- Contributions
-
- Synonym Detection
-
- Entity Recognition and Synonym Extraction
- Improving the Accuracy of Time
-
- Query Expansion
-
- Time-based Synonyms
- Ranking Time-independent Synonyms
- Ranking Time-dependent Synonyms
-
- Evaluation
-
- Experiment Setting
- Experimental Results
-
- Conclusions
-
- Conclusions and Future Work
-
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Application
News archive searchSearch terms are named entitiesPublication dates of documents are temporal criteria
Challenge
Semantic gaps in searching archives or a lack of knowledgeabout a query and synonyms at particular time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Figure A snapshot of Wikipediaand current revisions at time tk
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Example[Bunescu and Pasca EACLrsquo2006]1) Multi-word titles and all words arecapitalized
President_of_the_United_StatesrArr named entity
2) Single-word titles with multiple capitalletters
UNICEF and WHO are namedentities
3) 75 of occurrences in the article textitself are capitalized
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Example
ei President_of_the_United_States
tk 112001
sj ldquoGeorge W Bushrdquo
ξi (ei sj ) or(President_of_the_United_StatesldquoGeorge W Bushrdquo)
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links
Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm
Example[[President_of_the_United_
States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_
States
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links
Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm
Example[[President_of_the_United_
States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_
States
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
OutputEntity-synonym relationships and time periods
Named Entity Synonym Time Period
Pope Benedict XVICardinal Joseph Ratzinger 052005 - 032009Joseph Ratzinger 052005 - 032009Pope Benedict XVI 052005 - 032009
Barack ObamaBarack Hussein Obama II 022007 - 032009Sen Barack Obama 072007 - 032009Senator Barack Obama 052006 - 032009
Hillary Rodham ClintonHillary Clinton 082003 - 032009Sen Hillary Clinton 032007 - 032009Senator Clinton 112007 - 032009
The time of synonyms are timestamps of Wikipedia articles (8 years) in which they
appear not temporal expression extracted from the contents
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time
18M articles from January 1987 to June 2007 (20 years)
Use the burst detection algorithm [Kleinberg in KDDrsquo2002]
Generate bursty periods of ξij by computing a rate of occurrencefrom document streams
Output bursty intervals and bursty weight ie periods ofoccurrence and intensity
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
OutputResults from burst-detection algorithm
Synonym Entity Burst Weight TimeStart End
President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993
Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Classifying synonyms into two types
DefinitionClass A time-independent
Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)
Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo
Class B time-dependent
Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered
Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Overview of experiments
Our experimental evaluation is divided into three main parts
1 Extracting and improving the accuracy of time of synonyms
2 Query expansion using time-independent synonyms
3 Query expansion using time-dependent synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonymsData collection
The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)
New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007
ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725
Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1
Measurement Accuracy
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
Data collection
TREC Robust Track (2004)
250 topics (topics 301-450 and topics 601-700)
Tools
Terrier ndash an open source search engine developed by University of Glasgow
BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting
Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight
qexp = qorg s1andw1 s2
andw2 skandwk
Measurement Mean Average Precision (MAP) R-precision and Recall
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonymsData collection
NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications
Select 20 strongly time-dependent queries
Measurement Precision at 10 20 and 30 retrieved documents
Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period
American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonyms
Statistics and accuracy of entity-synonym relationships extracted from Wikipedia
NER Method NE NE-Syn Avg Syn Accuracyper NE ()
BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73
BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2
BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo
Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Robust2004 query statistics
Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)
2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)
k = 2 if k gt 2 bring noise to the NERQ process
Number of queries using two different NERType MW-NERQ MRW-NERQ
Named entity 42 149Not named entity 208 101Total 250 250
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ
MAP R-precision Recall MAP R-precision Recall
PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504
PM Probabilistic Model without query expansion
PRF Pseudo Relevance Feedback using Rocchio algorithm
SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback
SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback
Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonyms
P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800
TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
QUEST Query Expansion using Synonyms over Time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest
Thank you
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
- Outline
- Main Talk
-
- Introduction
-
- Problem Statement
- Contributions
-
- Synonym Detection
-
- Entity Recognition and Synonym Extraction
- Improving the Accuracy of Time
-
- Query Expansion
-
- Time-based Synonyms
- Ranking Time-independent Synonyms
- Ranking Time-dependent Synonyms
-
- Evaluation
-
- Experiment Setting
- Experimental Results
-
- Conclusions
-
- Conclusions and Future Work
-
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Figure A snapshot of Wikipediaand current revisions at time tk
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Example[Bunescu and Pasca EACLrsquo2006]1) Multi-word titles and all words arecapitalized
President_of_the_United_StatesrArr named entity
2) Single-word titles with multiple capitalletters
UNICEF and WHO are namedentities
3) 75 of occurrences in the article textitself are capitalized
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Example
ei President_of_the_United_States
tk 112001
sj ldquoGeorge W Bushrdquo
ξi (ei sj ) or(President_of_the_United_StatesldquoGeorge W Bushrdquo)
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links
Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm
Example[[President_of_the_United_
States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_
States
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links
Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm
Example[[President_of_the_United_
States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_
States
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
OutputEntity-synonym relationships and time periods
Named Entity Synonym Time Period
Pope Benedict XVICardinal Joseph Ratzinger 052005 - 032009Joseph Ratzinger 052005 - 032009Pope Benedict XVI 052005 - 032009
Barack ObamaBarack Hussein Obama II 022007 - 032009Sen Barack Obama 072007 - 032009Senator Barack Obama 052006 - 032009
Hillary Rodham ClintonHillary Clinton 082003 - 032009Sen Hillary Clinton 032007 - 032009Senator Clinton 112007 - 032009
The time of synonyms are timestamps of Wikipedia articles (8 years) in which they
appear not temporal expression extracted from the contents
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time
18M articles from January 1987 to June 2007 (20 years)
Use the burst detection algorithm [Kleinberg in KDDrsquo2002]
Generate bursty periods of ξij by computing a rate of occurrencefrom document streams
Output bursty intervals and bursty weight ie periods ofoccurrence and intensity
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
OutputResults from burst-detection algorithm
Synonym Entity Burst Weight TimeStart End
President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993
Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Classifying synonyms into two types
DefinitionClass A time-independent
Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)
Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo
Class B time-dependent
Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered
Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Overview of experiments
Our experimental evaluation is divided into three main parts
1 Extracting and improving the accuracy of time of synonyms
2 Query expansion using time-independent synonyms
3 Query expansion using time-dependent synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonymsData collection
The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)
New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007
ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725
Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1
Measurement Accuracy
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
Data collection
TREC Robust Track (2004)
250 topics (topics 301-450 and topics 601-700)
Tools
Terrier ndash an open source search engine developed by University of Glasgow
BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting
Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight
qexp = qorg s1andw1 s2
andw2 skandwk
Measurement Mean Average Precision (MAP) R-precision and Recall
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonymsData collection
NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications
Select 20 strongly time-dependent queries
Measurement Precision at 10 20 and 30 retrieved documents
Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period
American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonyms
Statistics and accuracy of entity-synonym relationships extracted from Wikipedia
NER Method NE NE-Syn Avg Syn Accuracyper NE ()
BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73
BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2
BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo
Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Robust2004 query statistics
Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)
2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)
k = 2 if k gt 2 bring noise to the NERQ process
Number of queries using two different NERType MW-NERQ MRW-NERQ
Named entity 42 149Not named entity 208 101Total 250 250
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ
MAP R-precision Recall MAP R-precision Recall
PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504
PM Probabilistic Model without query expansion
PRF Pseudo Relevance Feedback using Rocchio algorithm
SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback
SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback
Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonyms
P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800
TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
QUEST Query Expansion using Synonyms over Time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest
Thank you
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
- Outline
- Main Talk
-
- Introduction
-
- Problem Statement
- Contributions
-
- Synonym Detection
-
- Entity Recognition and Synonym Extraction
- Improving the Accuracy of Time
-
- Query Expansion
-
- Time-based Synonyms
- Ranking Time-independent Synonyms
- Ranking Time-dependent Synonyms
-
- Evaluation
-
- Experiment Setting
- Experimental Results
-
- Conclusions
-
- Conclusions and Future Work
-
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Figure A snapshot of Wikipediaand current revisions at time tk
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Example[Bunescu and Pasca EACLrsquo2006]1) Multi-word titles and all words arecapitalized
President_of_the_United_StatesrArr named entity
2) Single-word titles with multiple capitalletters
UNICEF and WHO are namedentities
3) 75 of occurrences in the article textitself are capitalized
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Example
ei President_of_the_United_States
tk 112001
sj ldquoGeorge W Bushrdquo
ξi (ei sj ) or(President_of_the_United_StatesldquoGeorge W Bushrdquo)
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links
Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm
Example[[President_of_the_United_
States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_
States
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links
Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm
Example[[President_of_the_United_
States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_
States
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
OutputEntity-synonym relationships and time periods
Named Entity Synonym Time Period
Pope Benedict XVICardinal Joseph Ratzinger 052005 - 032009Joseph Ratzinger 052005 - 032009Pope Benedict XVI 052005 - 032009
Barack ObamaBarack Hussein Obama II 022007 - 032009Sen Barack Obama 072007 - 032009Senator Barack Obama 052006 - 032009
Hillary Rodham ClintonHillary Clinton 082003 - 032009Sen Hillary Clinton 032007 - 032009Senator Clinton 112007 - 032009
The time of synonyms are timestamps of Wikipedia articles (8 years) in which they
appear not temporal expression extracted from the contents
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time
18M articles from January 1987 to June 2007 (20 years)
Use the burst detection algorithm [Kleinberg in KDDrsquo2002]
Generate bursty periods of ξij by computing a rate of occurrencefrom document streams
Output bursty intervals and bursty weight ie periods ofoccurrence and intensity
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
OutputResults from burst-detection algorithm
Synonym Entity Burst Weight TimeStart End
President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993
Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Classifying synonyms into two types
DefinitionClass A time-independent
Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)
Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo
Class B time-dependent
Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered
Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Overview of experiments
Our experimental evaluation is divided into three main parts
1 Extracting and improving the accuracy of time of synonyms
2 Query expansion using time-independent synonyms
3 Query expansion using time-dependent synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonymsData collection
The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)
New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007
ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725
Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1
Measurement Accuracy
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
Data collection
TREC Robust Track (2004)
250 topics (topics 301-450 and topics 601-700)
Tools
Terrier ndash an open source search engine developed by University of Glasgow
BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting
Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight
qexp = qorg s1andw1 s2
andw2 skandwk
Measurement Mean Average Precision (MAP) R-precision and Recall
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonymsData collection
NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications
Select 20 strongly time-dependent queries
Measurement Precision at 10 20 and 30 retrieved documents
Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period
American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonyms
Statistics and accuracy of entity-synonym relationships extracted from Wikipedia
NER Method NE NE-Syn Avg Syn Accuracyper NE ()
BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73
BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2
BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo
Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Robust2004 query statistics
Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)
2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)
k = 2 if k gt 2 bring noise to the NERQ process
Number of queries using two different NERType MW-NERQ MRW-NERQ
Named entity 42 149Not named entity 208 101Total 250 250
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ
MAP R-precision Recall MAP R-precision Recall
PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504
PM Probabilistic Model without query expansion
PRF Pseudo Relevance Feedback using Rocchio algorithm
SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback
SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback
Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonyms
P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800
TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
QUEST Query Expansion using Synonyms over Time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest
Thank you
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
- Outline
- Main Talk
-
- Introduction
-
- Problem Statement
- Contributions
-
- Synonym Detection
-
- Entity Recognition and Synonym Extraction
- Improving the Accuracy of Time
-
- Query Expansion
-
- Time-based Synonyms
- Ranking Time-independent Synonyms
- Ranking Time-dependent Synonyms
-
- Evaluation
-
- Experiment Setting
- Experimental Results
-
- Conclusions
-
- Conclusions and Future Work
-
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Figure A snapshot of Wikipediaand current revisions at time tk
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Example[Bunescu and Pasca EACLrsquo2006]1) Multi-word titles and all words arecapitalized
President_of_the_United_StatesrArr named entity
2) Single-word titles with multiple capitalletters
UNICEF and WHO are namedentities
3) 75 of occurrences in the article textitself are capitalized
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Example
ei President_of_the_United_States
tk 112001
sj ldquoGeorge W Bushrdquo
ξi (ei sj ) or(President_of_the_United_StatesldquoGeorge W Bushrdquo)
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links
Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm
Example[[President_of_the_United_
States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_
States
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links
Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm
Example[[President_of_the_United_
States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_
States
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
OutputEntity-synonym relationships and time periods
Named Entity Synonym Time Period
Pope Benedict XVICardinal Joseph Ratzinger 052005 - 032009Joseph Ratzinger 052005 - 032009Pope Benedict XVI 052005 - 032009
Barack ObamaBarack Hussein Obama II 022007 - 032009Sen Barack Obama 072007 - 032009Senator Barack Obama 052006 - 032009
Hillary Rodham ClintonHillary Clinton 082003 - 032009Sen Hillary Clinton 032007 - 032009Senator Clinton 112007 - 032009
The time of synonyms are timestamps of Wikipedia articles (8 years) in which they
appear not temporal expression extracted from the contents
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time
18M articles from January 1987 to June 2007 (20 years)
Use the burst detection algorithm [Kleinberg in KDDrsquo2002]
Generate bursty periods of ξij by computing a rate of occurrencefrom document streams
Output bursty intervals and bursty weight ie periods ofoccurrence and intensity
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
OutputResults from burst-detection algorithm
Synonym Entity Burst Weight TimeStart End
President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993
Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Classifying synonyms into two types
DefinitionClass A time-independent
Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)
Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo
Class B time-dependent
Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered
Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Overview of experiments
Our experimental evaluation is divided into three main parts
1 Extracting and improving the accuracy of time of synonyms
2 Query expansion using time-independent synonyms
3 Query expansion using time-dependent synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonymsData collection
The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)
New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007
ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725
Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1
Measurement Accuracy
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
Data collection
TREC Robust Track (2004)
250 topics (topics 301-450 and topics 601-700)
Tools
Terrier ndash an open source search engine developed by University of Glasgow
BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting
Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight
qexp = qorg s1andw1 s2
andw2 skandwk
Measurement Mean Average Precision (MAP) R-precision and Recall
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonymsData collection
NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications
Select 20 strongly time-dependent queries
Measurement Precision at 10 20 and 30 retrieved documents
Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period
American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonyms
Statistics and accuracy of entity-synonym relationships extracted from Wikipedia
NER Method NE NE-Syn Avg Syn Accuracyper NE ()
BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73
BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2
BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo
Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Robust2004 query statistics
Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)
2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)
k = 2 if k gt 2 bring noise to the NERQ process
Number of queries using two different NERType MW-NERQ MRW-NERQ
Named entity 42 149Not named entity 208 101Total 250 250
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ
MAP R-precision Recall MAP R-precision Recall
PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504
PM Probabilistic Model without query expansion
PRF Pseudo Relevance Feedback using Rocchio algorithm
SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback
SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback
Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonyms
P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800
TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
QUEST Query Expansion using Synonyms over Time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest
Thank you
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
- Outline
- Main Talk
-
- Introduction
-
- Problem Statement
- Contributions
-
- Synonym Detection
-
- Entity Recognition and Synonym Extraction
- Improving the Accuracy of Time
-
- Query Expansion
-
- Time-based Synonyms
- Ranking Time-independent Synonyms
- Ranking Time-dependent Synonyms
-
- Evaluation
-
- Experiment Setting
- Experimental Results
-
- Conclusions
-
- Conclusions and Future Work
-
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Problem StatementContributions
Contributions
1 Formal modelsWikipedia viewed as a temporal resource
2 Proposed approachesDiscover time-based synonyms over timeImprove the accuracy of time of synonymsExpand a query using time-based synonyms
3 ExperimentsEvaluate extracting and improving time of synonymsEvaluate query expansion using time-based synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Figure A snapshot of Wikipediaand current revisions at time tk
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Example[Bunescu and Pasca EACLrsquo2006]1) Multi-word titles and all words arecapitalized
President_of_the_United_StatesrArr named entity
2) Single-word titles with multiple capitalletters
UNICEF and WHO are namedentities
3) 75 of occurrences in the article textitself are capitalized
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Example
ei President_of_the_United_States
tk 112001
sj ldquoGeorge W Bushrdquo
ξi (ei sj ) or(President_of_the_United_StatesldquoGeorge W Bushrdquo)
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links
Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm
Example[[President_of_the_United_
States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_
States
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links
Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm
Example[[President_of_the_United_
States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_
States
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
OutputEntity-synonym relationships and time periods
Named Entity Synonym Time Period
Pope Benedict XVICardinal Joseph Ratzinger 052005 - 032009Joseph Ratzinger 052005 - 032009Pope Benedict XVI 052005 - 032009
Barack ObamaBarack Hussein Obama II 022007 - 032009Sen Barack Obama 072007 - 032009Senator Barack Obama 052006 - 032009
Hillary Rodham ClintonHillary Clinton 082003 - 032009Sen Hillary Clinton 032007 - 032009Senator Clinton 112007 - 032009
The time of synonyms are timestamps of Wikipedia articles (8 years) in which they
appear not temporal expression extracted from the contents
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time
18M articles from January 1987 to June 2007 (20 years)
Use the burst detection algorithm [Kleinberg in KDDrsquo2002]
Generate bursty periods of ξij by computing a rate of occurrencefrom document streams
Output bursty intervals and bursty weight ie periods ofoccurrence and intensity
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
OutputResults from burst-detection algorithm
Synonym Entity Burst Weight TimeStart End
President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993
Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Classifying synonyms into two types
DefinitionClass A time-independent
Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)
Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo
Class B time-dependent
Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered
Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Overview of experiments
Our experimental evaluation is divided into three main parts
1 Extracting and improving the accuracy of time of synonyms
2 Query expansion using time-independent synonyms
3 Query expansion using time-dependent synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonymsData collection
The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)
New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007
ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725
Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1
Measurement Accuracy
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
Data collection
TREC Robust Track (2004)
250 topics (topics 301-450 and topics 601-700)
Tools
Terrier ndash an open source search engine developed by University of Glasgow
BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting
Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight
qexp = qorg s1andw1 s2
andw2 skandwk
Measurement Mean Average Precision (MAP) R-precision and Recall
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonymsData collection
NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications
Select 20 strongly time-dependent queries
Measurement Precision at 10 20 and 30 retrieved documents
Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period
American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonyms
Statistics and accuracy of entity-synonym relationships extracted from Wikipedia
NER Method NE NE-Syn Avg Syn Accuracyper NE ()
BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73
BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2
BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo
Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Robust2004 query statistics
Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)
2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)
k = 2 if k gt 2 bring noise to the NERQ process
Number of queries using two different NERType MW-NERQ MRW-NERQ
Named entity 42 149Not named entity 208 101Total 250 250
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ
MAP R-precision Recall MAP R-precision Recall
PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504
PM Probabilistic Model without query expansion
PRF Pseudo Relevance Feedback using Rocchio algorithm
SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback
SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback
Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonyms
P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800
TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
QUEST Query Expansion using Synonyms over Time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest
Thank you
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
- Outline
- Main Talk
-
- Introduction
-
- Problem Statement
- Contributions
-
- Synonym Detection
-
- Entity Recognition and Synonym Extraction
- Improving the Accuracy of Time
-
- Query Expansion
-
- Time-based Synonyms
- Ranking Time-independent Synonyms
- Ranking Time-dependent Synonyms
-
- Evaluation
-
- Experiment Setting
- Experimental Results
-
- Conclusions
-
- Conclusions and Future Work
-
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Figure A snapshot of Wikipediaand current revisions at time tk
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Example[Bunescu and Pasca EACLrsquo2006]1) Multi-word titles and all words arecapitalized
President_of_the_United_StatesrArr named entity
2) Single-word titles with multiple capitalletters
UNICEF and WHO are namedentities
3) 75 of occurrences in the article textitself are capitalized
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Example
ei President_of_the_United_States
tk 112001
sj ldquoGeorge W Bushrdquo
ξi (ei sj ) or(President_of_the_United_StatesldquoGeorge W Bushrdquo)
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links
Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm
Example[[President_of_the_United_
States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_
States
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links
Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm
Example[[President_of_the_United_
States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_
States
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
OutputEntity-synonym relationships and time periods
Named Entity Synonym Time Period
Pope Benedict XVICardinal Joseph Ratzinger 052005 - 032009Joseph Ratzinger 052005 - 032009Pope Benedict XVI 052005 - 032009
Barack ObamaBarack Hussein Obama II 022007 - 032009Sen Barack Obama 072007 - 032009Senator Barack Obama 052006 - 032009
Hillary Rodham ClintonHillary Clinton 082003 - 032009Sen Hillary Clinton 032007 - 032009Senator Clinton 112007 - 032009
The time of synonyms are timestamps of Wikipedia articles (8 years) in which they
appear not temporal expression extracted from the contents
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time
18M articles from January 1987 to June 2007 (20 years)
Use the burst detection algorithm [Kleinberg in KDDrsquo2002]
Generate bursty periods of ξij by computing a rate of occurrencefrom document streams
Output bursty intervals and bursty weight ie periods ofoccurrence and intensity
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
OutputResults from burst-detection algorithm
Synonym Entity Burst Weight TimeStart End
President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993
Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Classifying synonyms into two types
DefinitionClass A time-independent
Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)
Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo
Class B time-dependent
Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered
Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Overview of experiments
Our experimental evaluation is divided into three main parts
1 Extracting and improving the accuracy of time of synonyms
2 Query expansion using time-independent synonyms
3 Query expansion using time-dependent synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonymsData collection
The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)
New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007
ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725
Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1
Measurement Accuracy
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
Data collection
TREC Robust Track (2004)
250 topics (topics 301-450 and topics 601-700)
Tools
Terrier ndash an open source search engine developed by University of Glasgow
BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting
Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight
qexp = qorg s1andw1 s2
andw2 skandwk
Measurement Mean Average Precision (MAP) R-precision and Recall
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonymsData collection
NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications
Select 20 strongly time-dependent queries
Measurement Precision at 10 20 and 30 retrieved documents
Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period
American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonyms
Statistics and accuracy of entity-synonym relationships extracted from Wikipedia
NER Method NE NE-Syn Avg Syn Accuracyper NE ()
BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73
BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2
BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo
Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Robust2004 query statistics
Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)
2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)
k = 2 if k gt 2 bring noise to the NERQ process
Number of queries using two different NERType MW-NERQ MRW-NERQ
Named entity 42 149Not named entity 208 101Total 250 250
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ
MAP R-precision Recall MAP R-precision Recall
PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504
PM Probabilistic Model without query expansion
PRF Pseudo Relevance Feedback using Rocchio algorithm
SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback
SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback
Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonyms
P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800
TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
QUEST Query Expansion using Synonyms over Time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest
Thank you
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
- Outline
- Main Talk
-
- Introduction
-
- Problem Statement
- Contributions
-
- Synonym Detection
-
- Entity Recognition and Synonym Extraction
- Improving the Accuracy of Time
-
- Query Expansion
-
- Time-based Synonyms
- Ranking Time-independent Synonyms
- Ranking Time-dependent Synonyms
-
- Evaluation
-
- Experiment Setting
- Experimental Results
-
- Conclusions
-
- Conclusions and Future Work
-
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Figure A snapshot of Wikipediaand current revisions at time tk
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Example[Bunescu and Pasca EACLrsquo2006]1) Multi-word titles and all words arecapitalized
President_of_the_United_StatesrArr named entity
2) Single-word titles with multiple capitalletters
UNICEF and WHO are namedentities
3) 75 of occurrences in the article textitself are capitalized
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Example
ei President_of_the_United_States
tk 112001
sj ldquoGeorge W Bushrdquo
ξi (ei sj ) or(President_of_the_United_StatesldquoGeorge W Bushrdquo)
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links
Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm
Example[[President_of_the_United_
States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_
States
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links
Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm
Example[[President_of_the_United_
States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_
States
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
OutputEntity-synonym relationships and time periods
Named Entity Synonym Time Period
Pope Benedict XVICardinal Joseph Ratzinger 052005 - 032009Joseph Ratzinger 052005 - 032009Pope Benedict XVI 052005 - 032009
Barack ObamaBarack Hussein Obama II 022007 - 032009Sen Barack Obama 072007 - 032009Senator Barack Obama 052006 - 032009
Hillary Rodham ClintonHillary Clinton 082003 - 032009Sen Hillary Clinton 032007 - 032009Senator Clinton 112007 - 032009
The time of synonyms are timestamps of Wikipedia articles (8 years) in which they
appear not temporal expression extracted from the contents
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time
18M articles from January 1987 to June 2007 (20 years)
Use the burst detection algorithm [Kleinberg in KDDrsquo2002]
Generate bursty periods of ξij by computing a rate of occurrencefrom document streams
Output bursty intervals and bursty weight ie periods ofoccurrence and intensity
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
OutputResults from burst-detection algorithm
Synonym Entity Burst Weight TimeStart End
President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993
Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Classifying synonyms into two types
DefinitionClass A time-independent
Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)
Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo
Class B time-dependent
Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered
Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Overview of experiments
Our experimental evaluation is divided into three main parts
1 Extracting and improving the accuracy of time of synonyms
2 Query expansion using time-independent synonyms
3 Query expansion using time-dependent synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonymsData collection
The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)
New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007
ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725
Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1
Measurement Accuracy
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
Data collection
TREC Robust Track (2004)
250 topics (topics 301-450 and topics 601-700)
Tools
Terrier ndash an open source search engine developed by University of Glasgow
BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting
Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight
qexp = qorg s1andw1 s2
andw2 skandwk
Measurement Mean Average Precision (MAP) R-precision and Recall
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonymsData collection
NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications
Select 20 strongly time-dependent queries
Measurement Precision at 10 20 and 30 retrieved documents
Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period
American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonyms
Statistics and accuracy of entity-synonym relationships extracted from Wikipedia
NER Method NE NE-Syn Avg Syn Accuracyper NE ()
BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73
BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2
BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo
Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Robust2004 query statistics
Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)
2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)
k = 2 if k gt 2 bring noise to the NERQ process
Number of queries using two different NERType MW-NERQ MRW-NERQ
Named entity 42 149Not named entity 208 101Total 250 250
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ
MAP R-precision Recall MAP R-precision Recall
PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504
PM Probabilistic Model without query expansion
PRF Pseudo Relevance Feedback using Rocchio algorithm
SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback
SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback
Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonyms
P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800
TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
QUEST Query Expansion using Synonyms over Time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest
Thank you
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
- Outline
- Main Talk
-
- Introduction
-
- Problem Statement
- Contributions
-
- Synonym Detection
-
- Entity Recognition and Synonym Extraction
- Improving the Accuracy of Time
-
- Query Expansion
-
- Time-based Synonyms
- Ranking Time-independent Synonyms
- Ranking Time-dependent Synonyms
-
- Evaluation
-
- Experiment Setting
- Experimental Results
-
- Conclusions
-
- Conclusions and Future Work
-
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Example[Bunescu and Pasca EACLrsquo2006]1) Multi-word titles and all words arecapitalized
President_of_the_United_StatesrArr named entity
2) Single-word titles with multiple capitalletters
UNICEF and WHO are namedentities
3) 75 of occurrences in the article textitself are capitalized
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Example
ei President_of_the_United_States
tk 112001
sj ldquoGeorge W Bushrdquo
ξi (ei sj ) or(President_of_the_United_StatesldquoGeorge W Bushrdquo)
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links
Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm
Example[[President_of_the_United_
States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_
States
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links
Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm
Example[[President_of_the_United_
States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_
States
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
OutputEntity-synonym relationships and time periods
Named Entity Synonym Time Period
Pope Benedict XVICardinal Joseph Ratzinger 052005 - 032009Joseph Ratzinger 052005 - 032009Pope Benedict XVI 052005 - 032009
Barack ObamaBarack Hussein Obama II 022007 - 032009Sen Barack Obama 072007 - 032009Senator Barack Obama 052006 - 032009
Hillary Rodham ClintonHillary Clinton 082003 - 032009Sen Hillary Clinton 032007 - 032009Senator Clinton 112007 - 032009
The time of synonyms are timestamps of Wikipedia articles (8 years) in which they
appear not temporal expression extracted from the contents
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time
18M articles from January 1987 to June 2007 (20 years)
Use the burst detection algorithm [Kleinberg in KDDrsquo2002]
Generate bursty periods of ξij by computing a rate of occurrencefrom document streams
Output bursty intervals and bursty weight ie periods ofoccurrence and intensity
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
OutputResults from burst-detection algorithm
Synonym Entity Burst Weight TimeStart End
President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993
Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Classifying synonyms into two types
DefinitionClass A time-independent
Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)
Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo
Class B time-dependent
Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered
Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Overview of experiments
Our experimental evaluation is divided into three main parts
1 Extracting and improving the accuracy of time of synonyms
2 Query expansion using time-independent synonyms
3 Query expansion using time-dependent synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonymsData collection
The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)
New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007
ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725
Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1
Measurement Accuracy
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
Data collection
TREC Robust Track (2004)
250 topics (topics 301-450 and topics 601-700)
Tools
Terrier ndash an open source search engine developed by University of Glasgow
BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting
Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight
qexp = qorg s1andw1 s2
andw2 skandwk
Measurement Mean Average Precision (MAP) R-precision and Recall
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonymsData collection
NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications
Select 20 strongly time-dependent queries
Measurement Precision at 10 20 and 30 retrieved documents
Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period
American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonyms
Statistics and accuracy of entity-synonym relationships extracted from Wikipedia
NER Method NE NE-Syn Avg Syn Accuracyper NE ()
BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73
BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2
BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo
Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Robust2004 query statistics
Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)
2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)
k = 2 if k gt 2 bring noise to the NERQ process
Number of queries using two different NERType MW-NERQ MRW-NERQ
Named entity 42 149Not named entity 208 101Total 250 250
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ
MAP R-precision Recall MAP R-precision Recall
PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504
PM Probabilistic Model without query expansion
PRF Pseudo Relevance Feedback using Rocchio algorithm
SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback
SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback
Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonyms
P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800
TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
QUEST Query Expansion using Synonyms over Time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest
Thank you
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
- Outline
- Main Talk
-
- Introduction
-
- Problem Statement
- Contributions
-
- Synonym Detection
-
- Entity Recognition and Synonym Extraction
- Improving the Accuracy of Time
-
- Query Expansion
-
- Time-based Synonyms
- Ranking Time-independent Synonyms
- Ranking Time-dependent Synonyms
-
- Evaluation
-
- Experiment Setting
- Experimental Results
-
- Conclusions
-
- Conclusions and Future Work
-
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Example
ei President_of_the_United_States
tk 112001
sj ldquoGeorge W Bushrdquo
ξi (ei sj ) or(President_of_the_United_StatesldquoGeorge W Bushrdquo)
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links
Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm
Example[[President_of_the_United_
States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_
States
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links
Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm
Example[[President_of_the_United_
States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_
States
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
OutputEntity-synonym relationships and time periods
Named Entity Synonym Time Period
Pope Benedict XVICardinal Joseph Ratzinger 052005 - 032009Joseph Ratzinger 052005 - 032009Pope Benedict XVI 052005 - 032009
Barack ObamaBarack Hussein Obama II 022007 - 032009Sen Barack Obama 072007 - 032009Senator Barack Obama 052006 - 032009
Hillary Rodham ClintonHillary Clinton 082003 - 032009Sen Hillary Clinton 032007 - 032009Senator Clinton 112007 - 032009
The time of synonyms are timestamps of Wikipedia articles (8 years) in which they
appear not temporal expression extracted from the contents
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time
18M articles from January 1987 to June 2007 (20 years)
Use the burst detection algorithm [Kleinberg in KDDrsquo2002]
Generate bursty periods of ξij by computing a rate of occurrencefrom document streams
Output bursty intervals and bursty weight ie periods ofoccurrence and intensity
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
OutputResults from burst-detection algorithm
Synonym Entity Burst Weight TimeStart End
President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993
Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Classifying synonyms into two types
DefinitionClass A time-independent
Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)
Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo
Class B time-dependent
Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered
Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Overview of experiments
Our experimental evaluation is divided into three main parts
1 Extracting and improving the accuracy of time of synonyms
2 Query expansion using time-independent synonyms
3 Query expansion using time-dependent synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonymsData collection
The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)
New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007
ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725
Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1
Measurement Accuracy
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
Data collection
TREC Robust Track (2004)
250 topics (topics 301-450 and topics 601-700)
Tools
Terrier ndash an open source search engine developed by University of Glasgow
BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting
Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight
qexp = qorg s1andw1 s2
andw2 skandwk
Measurement Mean Average Precision (MAP) R-precision and Recall
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonymsData collection
NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications
Select 20 strongly time-dependent queries
Measurement Precision at 10 20 and 30 retrieved documents
Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period
American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonyms
Statistics and accuracy of entity-synonym relationships extracted from Wikipedia
NER Method NE NE-Syn Avg Syn Accuracyper NE ()
BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73
BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2
BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo
Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Robust2004 query statistics
Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)
2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)
k = 2 if k gt 2 bring noise to the NERQ process
Number of queries using two different NERType MW-NERQ MRW-NERQ
Named entity 42 149Not named entity 208 101Total 250 250
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ
MAP R-precision Recall MAP R-precision Recall
PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504
PM Probabilistic Model without query expansion
PRF Pseudo Relevance Feedback using Rocchio algorithm
SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback
SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback
Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonyms
P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800
TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
QUEST Query Expansion using Synonyms over Time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest
Thank you
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
- Outline
- Main Talk
-
- Introduction
-
- Problem Statement
- Contributions
-
- Synonym Detection
-
- Entity Recognition and Synonym Extraction
- Improving the Accuracy of Time
-
- Query Expansion
-
- Time-based Synonyms
- Ranking Time-independent Synonyms
- Ranking Time-dependent Synonyms
-
- Evaluation
-
- Experiment Setting
- Experimental Results
-
- Conclusions
-
- Conclusions and Future Work
-
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Recognizing named entities
Step 1 Partition Wikipediaregarding to the time granularityg = month to obtain itssnapshots W = Wt1 Wtz
Step 2 For each snapshotWtk isin W identify named entitypages to obtain a set of namedentities Etk = e1 ej
Step 3 For each name entityei isin Etk find a set ofentity-synonym relationshipsStk = ξ11 ξnm
Example
ei President_of_the_United_States
tk 112001
sj ldquoGeorge W Bushrdquo
ξi (ei sj ) or(President_of_the_United_StatesldquoGeorge W Bushrdquo)
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links
Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm
Example[[President_of_the_United_
States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_
States
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links
Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm
Example[[President_of_the_United_
States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_
States
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
OutputEntity-synonym relationships and time periods
Named Entity Synonym Time Period
Pope Benedict XVICardinal Joseph Ratzinger 052005 - 032009Joseph Ratzinger 052005 - 032009Pope Benedict XVI 052005 - 032009
Barack ObamaBarack Hussein Obama II 022007 - 032009Sen Barack Obama 072007 - 032009Senator Barack Obama 052006 - 032009
Hillary Rodham ClintonHillary Clinton 082003 - 032009Sen Hillary Clinton 032007 - 032009Senator Clinton 112007 - 032009
The time of synonyms are timestamps of Wikipedia articles (8 years) in which they
appear not temporal expression extracted from the contents
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time
18M articles from January 1987 to June 2007 (20 years)
Use the burst detection algorithm [Kleinberg in KDDrsquo2002]
Generate bursty periods of ξij by computing a rate of occurrencefrom document streams
Output bursty intervals and bursty weight ie periods ofoccurrence and intensity
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
OutputResults from burst-detection algorithm
Synonym Entity Burst Weight TimeStart End
President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993
Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Classifying synonyms into two types
DefinitionClass A time-independent
Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)
Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo
Class B time-dependent
Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered
Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Overview of experiments
Our experimental evaluation is divided into three main parts
1 Extracting and improving the accuracy of time of synonyms
2 Query expansion using time-independent synonyms
3 Query expansion using time-dependent synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonymsData collection
The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)
New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007
ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725
Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1
Measurement Accuracy
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
Data collection
TREC Robust Track (2004)
250 topics (topics 301-450 and topics 601-700)
Tools
Terrier ndash an open source search engine developed by University of Glasgow
BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting
Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight
qexp = qorg s1andw1 s2
andw2 skandwk
Measurement Mean Average Precision (MAP) R-precision and Recall
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonymsData collection
NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications
Select 20 strongly time-dependent queries
Measurement Precision at 10 20 and 30 retrieved documents
Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period
American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonyms
Statistics and accuracy of entity-synonym relationships extracted from Wikipedia
NER Method NE NE-Syn Avg Syn Accuracyper NE ()
BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73
BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2
BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo
Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Robust2004 query statistics
Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)
2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)
k = 2 if k gt 2 bring noise to the NERQ process
Number of queries using two different NERType MW-NERQ MRW-NERQ
Named entity 42 149Not named entity 208 101Total 250 250
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ
MAP R-precision Recall MAP R-precision Recall
PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504
PM Probabilistic Model without query expansion
PRF Pseudo Relevance Feedback using Rocchio algorithm
SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback
SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback
Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonyms
P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800
TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
QUEST Query Expansion using Synonyms over Time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest
Thank you
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
- Outline
- Main Talk
-
- Introduction
-
- Problem Statement
- Contributions
-
- Synonym Detection
-
- Entity Recognition and Synonym Extraction
- Improving the Accuracy of Time
-
- Query Expansion
-
- Time-based Synonyms
- Ranking Time-independent Synonyms
- Ranking Time-dependent Synonyms
-
- Evaluation
-
- Experiment Setting
- Experimental Results
-
- Conclusions
-
- Conclusions and Future Work
-
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links
Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm
Example[[President_of_the_United_
States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_
States
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links
Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm
Example[[President_of_the_United_
States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_
States
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
OutputEntity-synonym relationships and time periods
Named Entity Synonym Time Period
Pope Benedict XVICardinal Joseph Ratzinger 052005 - 032009Joseph Ratzinger 052005 - 032009Pope Benedict XVI 052005 - 032009
Barack ObamaBarack Hussein Obama II 022007 - 032009Sen Barack Obama 072007 - 032009Senator Barack Obama 052006 - 032009
Hillary Rodham ClintonHillary Clinton 082003 - 032009Sen Hillary Clinton 032007 - 032009Senator Clinton 112007 - 032009
The time of synonyms are timestamps of Wikipedia articles (8 years) in which they
appear not temporal expression extracted from the contents
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time
18M articles from January 1987 to June 2007 (20 years)
Use the burst detection algorithm [Kleinberg in KDDrsquo2002]
Generate bursty periods of ξij by computing a rate of occurrencefrom document streams
Output bursty intervals and bursty weight ie periods ofoccurrence and intensity
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
OutputResults from burst-detection algorithm
Synonym Entity Burst Weight TimeStart End
President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993
Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Classifying synonyms into two types
DefinitionClass A time-independent
Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)
Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo
Class B time-dependent
Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered
Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Overview of experiments
Our experimental evaluation is divided into three main parts
1 Extracting and improving the accuracy of time of synonyms
2 Query expansion using time-independent synonyms
3 Query expansion using time-dependent synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonymsData collection
The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)
New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007
ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725
Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1
Measurement Accuracy
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
Data collection
TREC Robust Track (2004)
250 topics (topics 301-450 and topics 601-700)
Tools
Terrier ndash an open source search engine developed by University of Glasgow
BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting
Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight
qexp = qorg s1andw1 s2
andw2 skandwk
Measurement Mean Average Precision (MAP) R-precision and Recall
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonymsData collection
NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications
Select 20 strongly time-dependent queries
Measurement Precision at 10 20 and 30 retrieved documents
Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period
American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonyms
Statistics and accuracy of entity-synonym relationships extracted from Wikipedia
NER Method NE NE-Syn Avg Syn Accuracyper NE ()
BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73
BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2
BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo
Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Robust2004 query statistics
Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)
2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)
k = 2 if k gt 2 bring noise to the NERQ process
Number of queries using two different NERType MW-NERQ MRW-NERQ
Named entity 42 149Not named entity 208 101Total 250 250
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ
MAP R-precision Recall MAP R-precision Recall
PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504
PM Probabilistic Model without query expansion
PRF Pseudo Relevance Feedback using Rocchio algorithm
SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback
SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback
Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonyms
P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800
TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
QUEST Query Expansion using Synonyms over Time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest
Thank you
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
- Outline
- Main Talk
-
- Introduction
-
- Problem Statement
- Contributions
-
- Synonym Detection
-
- Entity Recognition and Synonym Extraction
- Improving the Accuracy of Time
-
- Query Expansion
-
- Time-based Synonyms
- Ranking Time-independent Synonyms
- Ranking Time-dependent Synonyms
-
- Evaluation
-
- Experiment Setting
- Experimental Results
-
- Conclusions
-
- Conclusions and Future Work
-
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
Step 1 For each entityei isin Etk find its synonyms byextracting anchor texts fromarticle links
Step 2 Accumulateentity-synonym relationships forall entities at time tk ie asynonym snapshotStk = ξ11 ξnm
Example[[President_of_the_United_
States|BarackObama]] ldquoBarackObamardquo is anchor texts linking to thearticle President_of_the_United_
States
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
OutputEntity-synonym relationships and time periods
Named Entity Synonym Time Period
Pope Benedict XVICardinal Joseph Ratzinger 052005 - 032009Joseph Ratzinger 052005 - 032009Pope Benedict XVI 052005 - 032009
Barack ObamaBarack Hussein Obama II 022007 - 032009Sen Barack Obama 072007 - 032009Senator Barack Obama 052006 - 032009
Hillary Rodham ClintonHillary Clinton 082003 - 032009Sen Hillary Clinton 032007 - 032009Senator Clinton 112007 - 032009
The time of synonyms are timestamps of Wikipedia articles (8 years) in which they
appear not temporal expression extracted from the contents
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time
18M articles from January 1987 to June 2007 (20 years)
Use the burst detection algorithm [Kleinberg in KDDrsquo2002]
Generate bursty periods of ξij by computing a rate of occurrencefrom document streams
Output bursty intervals and bursty weight ie periods ofoccurrence and intensity
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
OutputResults from burst-detection algorithm
Synonym Entity Burst Weight TimeStart End
President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993
Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Classifying synonyms into two types
DefinitionClass A time-independent
Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)
Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo
Class B time-dependent
Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered
Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Overview of experiments
Our experimental evaluation is divided into three main parts
1 Extracting and improving the accuracy of time of synonyms
2 Query expansion using time-independent synonyms
3 Query expansion using time-dependent synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonymsData collection
The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)
New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007
ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725
Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1
Measurement Accuracy
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
Data collection
TREC Robust Track (2004)
250 topics (topics 301-450 and topics 601-700)
Tools
Terrier ndash an open source search engine developed by University of Glasgow
BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting
Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight
qexp = qorg s1andw1 s2
andw2 skandwk
Measurement Mean Average Precision (MAP) R-precision and Recall
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonymsData collection
NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications
Select 20 strongly time-dependent queries
Measurement Precision at 10 20 and 30 retrieved documents
Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period
American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonyms
Statistics and accuracy of entity-synonym relationships extracted from Wikipedia
NER Method NE NE-Syn Avg Syn Accuracyper NE ()
BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73
BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2
BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo
Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Robust2004 query statistics
Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)
2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)
k = 2 if k gt 2 bring noise to the NERQ process
Number of queries using two different NERType MW-NERQ MRW-NERQ
Named entity 42 149Not named entity 208 101Total 250 250
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ
MAP R-precision Recall MAP R-precision Recall
PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504
PM Probabilistic Model without query expansion
PRF Pseudo Relevance Feedback using Rocchio algorithm
SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback
SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback
Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonyms
P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800
TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
QUEST Query Expansion using Synonyms over Time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest
Thank you
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
- Outline
- Main Talk
-
- Introduction
-
- Problem Statement
- Contributions
-
- Synonym Detection
-
- Entity Recognition and Synonym Extraction
- Improving the Accuracy of Time
-
- Query Expansion
-
- Time-based Synonyms
- Ranking Time-independent Synonyms
- Ranking Time-dependent Synonyms
-
- Evaluation
-
- Experiment Setting
- Experimental Results
-
- Conclusions
-
- Conclusions and Future Work
-
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Extracting synonyms
OutputEntity-synonym relationships and time periods
Named Entity Synonym Time Period
Pope Benedict XVICardinal Joseph Ratzinger 052005 - 032009Joseph Ratzinger 052005 - 032009Pope Benedict XVI 052005 - 032009
Barack ObamaBarack Hussein Obama II 022007 - 032009Sen Barack Obama 072007 - 032009Senator Barack Obama 052006 - 032009
Hillary Rodham ClintonHillary Clinton 082003 - 032009Sen Hillary Clinton 032007 - 032009Senator Clinton 112007 - 032009
The time of synonyms are timestamps of Wikipedia articles (8 years) in which they
appear not temporal expression extracted from the contents
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time
18M articles from January 1987 to June 2007 (20 years)
Use the burst detection algorithm [Kleinberg in KDDrsquo2002]
Generate bursty periods of ξij by computing a rate of occurrencefrom document streams
Output bursty intervals and bursty weight ie periods ofoccurrence and intensity
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
OutputResults from burst-detection algorithm
Synonym Entity Burst Weight TimeStart End
President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993
Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Classifying synonyms into two types
DefinitionClass A time-independent
Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)
Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo
Class B time-dependent
Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered
Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Overview of experiments
Our experimental evaluation is divided into three main parts
1 Extracting and improving the accuracy of time of synonyms
2 Query expansion using time-independent synonyms
3 Query expansion using time-dependent synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonymsData collection
The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)
New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007
ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725
Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1
Measurement Accuracy
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
Data collection
TREC Robust Track (2004)
250 topics (topics 301-450 and topics 601-700)
Tools
Terrier ndash an open source search engine developed by University of Glasgow
BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting
Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight
qexp = qorg s1andw1 s2
andw2 skandwk
Measurement Mean Average Precision (MAP) R-precision and Recall
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonymsData collection
NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications
Select 20 strongly time-dependent queries
Measurement Precision at 10 20 and 30 retrieved documents
Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period
American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonyms
Statistics and accuracy of entity-synonym relationships extracted from Wikipedia
NER Method NE NE-Syn Avg Syn Accuracyper NE ()
BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73
BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2
BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo
Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Robust2004 query statistics
Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)
2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)
k = 2 if k gt 2 bring noise to the NERQ process
Number of queries using two different NERType MW-NERQ MRW-NERQ
Named entity 42 149Not named entity 208 101Total 250 250
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ
MAP R-precision Recall MAP R-precision Recall
PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504
PM Probabilistic Model without query expansion
PRF Pseudo Relevance Feedback using Rocchio algorithm
SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback
SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback
Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonyms
P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800
TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
QUEST Query Expansion using Synonyms over Time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest
Thank you
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
- Outline
- Main Talk
-
- Introduction
-
- Problem Statement
- Contributions
-
- Synonym Detection
-
- Entity Recognition and Synonym Extraction
- Improving the Accuracy of Time
-
- Query Expansion
-
- Time-based Synonyms
- Ranking Time-independent Synonyms
- Ranking Time-dependent Synonyms
-
- Evaluation
-
- Experiment Setting
- Experimental Results
-
- Conclusions
-
- Conclusions and Future Work
-
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time
18M articles from January 1987 to June 2007 (20 years)
Use the burst detection algorithm [Kleinberg in KDDrsquo2002]
Generate bursty periods of ξij by computing a rate of occurrencefrom document streams
Output bursty intervals and bursty weight ie periods ofoccurrence and intensity
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
OutputResults from burst-detection algorithm
Synonym Entity Burst Weight TimeStart End
President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993
Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Classifying synonyms into two types
DefinitionClass A time-independent
Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)
Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo
Class B time-dependent
Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered
Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Overview of experiments
Our experimental evaluation is divided into three main parts
1 Extracting and improving the accuracy of time of synonyms
2 Query expansion using time-independent synonyms
3 Query expansion using time-dependent synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonymsData collection
The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)
New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007
ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725
Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1
Measurement Accuracy
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
Data collection
TREC Robust Track (2004)
250 topics (topics 301-450 and topics 601-700)
Tools
Terrier ndash an open source search engine developed by University of Glasgow
BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting
Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight
qexp = qorg s1andw1 s2
andw2 skandwk
Measurement Mean Average Precision (MAP) R-precision and Recall
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonymsData collection
NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications
Select 20 strongly time-dependent queries
Measurement Precision at 10 20 and 30 retrieved documents
Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period
American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonyms
Statistics and accuracy of entity-synonym relationships extracted from Wikipedia
NER Method NE NE-Syn Avg Syn Accuracyper NE ()
BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73
BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2
BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo
Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Robust2004 query statistics
Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)
2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)
k = 2 if k gt 2 bring noise to the NERQ process
Number of queries using two different NERType MW-NERQ MRW-NERQ
Named entity 42 149Not named entity 208 101Total 250 250
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ
MAP R-precision Recall MAP R-precision Recall
PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504
PM Probabilistic Model without query expansion
PRF Pseudo Relevance Feedback using Rocchio algorithm
SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback
SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback
Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonyms
P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800
TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
QUEST Query Expansion using Synonyms over Time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest
Thank you
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
- Outline
- Main Talk
-
- Introduction
-
- Problem Statement
- Contributions
-
- Synonym Detection
-
- Entity Recognition and Synonym Extraction
- Improving the Accuracy of Time
-
- Query Expansion
-
- Time-based Synonyms
- Ranking Time-independent Synonyms
- Ranking Time-dependent Synonyms
-
- Evaluation
-
- Experiment Setting
- Experimental Results
-
- Conclusions
-
- Conclusions and Future Work
-
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
Analyze the New York Time Annotated Corpus (NYT) to discovermore accurate time
18M articles from January 1987 to June 2007 (20 years)
Use the burst detection algorithm [Kleinberg in KDDrsquo2002]
Generate bursty periods of ξij by computing a rate of occurrencefrom document streams
Output bursty intervals and bursty weight ie periods ofoccurrence and intensity
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
OutputResults from burst-detection algorithm
Synonym Entity Burst Weight TimeStart End
President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993
Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Classifying synonyms into two types
DefinitionClass A time-independent
Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)
Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo
Class B time-dependent
Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered
Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Overview of experiments
Our experimental evaluation is divided into three main parts
1 Extracting and improving the accuracy of time of synonyms
2 Query expansion using time-independent synonyms
3 Query expansion using time-dependent synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonymsData collection
The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)
New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007
ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725
Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1
Measurement Accuracy
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
Data collection
TREC Robust Track (2004)
250 topics (topics 301-450 and topics 601-700)
Tools
Terrier ndash an open source search engine developed by University of Glasgow
BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting
Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight
qexp = qorg s1andw1 s2
andw2 skandwk
Measurement Mean Average Precision (MAP) R-precision and Recall
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonymsData collection
NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications
Select 20 strongly time-dependent queries
Measurement Precision at 10 20 and 30 retrieved documents
Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period
American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonyms
Statistics and accuracy of entity-synonym relationships extracted from Wikipedia
NER Method NE NE-Syn Avg Syn Accuracyper NE ()
BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73
BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2
BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo
Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Robust2004 query statistics
Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)
2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)
k = 2 if k gt 2 bring noise to the NERQ process
Number of queries using two different NERType MW-NERQ MRW-NERQ
Named entity 42 149Not named entity 208 101Total 250 250
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ
MAP R-precision Recall MAP R-precision Recall
PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504
PM Probabilistic Model without query expansion
PRF Pseudo Relevance Feedback using Rocchio algorithm
SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback
SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback
Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonyms
P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800
TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
QUEST Query Expansion using Synonyms over Time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest
Thank you
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
- Outline
- Main Talk
-
- Introduction
-
- Problem Statement
- Contributions
-
- Synonym Detection
-
- Entity Recognition and Synonym Extraction
- Improving the Accuracy of Time
-
- Query Expansion
-
- Time-based Synonyms
- Ranking Time-independent Synonyms
- Ranking Time-dependent Synonyms
-
- Evaluation
-
- Experiment Setting
- Experimental Results
-
- Conclusions
-
- Conclusions and Future Work
-
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Entity Recognition and Synonym ExtractionImproving the Accuracy of Time
Improving the accuracy of time using burst detection
OutputResults from burst-detection algorithm
Synonym Entity Burst Weight TimeStart End
President Reagan Ronald Reagan 5506858 011987 021989President Ronald Ronald Reagan 100401 011989 031990President Ronald Ronald Reagan 67208 071990 021993
Senator Clinton Hillary Rodham Clinton 18214 012001 102001Senator Clinton Hillary Rodham Clinton 17732 052002 012003Senator Clinton Hillary Rodham Clinton 172356 062003 112004
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Classifying synonyms into two types
DefinitionClass A time-independent
Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)
Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo
Class B time-dependent
Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered
Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Overview of experiments
Our experimental evaluation is divided into three main parts
1 Extracting and improving the accuracy of time of synonyms
2 Query expansion using time-independent synonyms
3 Query expansion using time-dependent synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonymsData collection
The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)
New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007
ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725
Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1
Measurement Accuracy
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
Data collection
TREC Robust Track (2004)
250 topics (topics 301-450 and topics 601-700)
Tools
Terrier ndash an open source search engine developed by University of Glasgow
BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting
Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight
qexp = qorg s1andw1 s2
andw2 skandwk
Measurement Mean Average Precision (MAP) R-precision and Recall
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonymsData collection
NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications
Select 20 strongly time-dependent queries
Measurement Precision at 10 20 and 30 retrieved documents
Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period
American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonyms
Statistics and accuracy of entity-synonym relationships extracted from Wikipedia
NER Method NE NE-Syn Avg Syn Accuracyper NE ()
BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73
BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2
BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo
Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Robust2004 query statistics
Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)
2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)
k = 2 if k gt 2 bring noise to the NERQ process
Number of queries using two different NERType MW-NERQ MRW-NERQ
Named entity 42 149Not named entity 208 101Total 250 250
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ
MAP R-precision Recall MAP R-precision Recall
PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504
PM Probabilistic Model without query expansion
PRF Pseudo Relevance Feedback using Rocchio algorithm
SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback
SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback
Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonyms
P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800
TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
QUEST Query Expansion using Synonyms over Time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest
Thank you
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
- Outline
- Main Talk
-
- Introduction
-
- Problem Statement
- Contributions
-
- Synonym Detection
-
- Entity Recognition and Synonym Extraction
- Improving the Accuracy of Time
-
- Query Expansion
-
- Time-based Synonyms
- Ranking Time-independent Synonyms
- Ranking Time-dependent Synonyms
-
- Evaluation
-
- Experiment Setting
- Experimental Results
-
- Conclusions
-
- Conclusions and Future Work
-
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Classifying synonyms into two types
DefinitionClass A time-independent
Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)
Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo
Class B time-dependent
Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered
Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Overview of experiments
Our experimental evaluation is divided into three main parts
1 Extracting and improving the accuracy of time of synonyms
2 Query expansion using time-independent synonyms
3 Query expansion using time-dependent synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonymsData collection
The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)
New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007
ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725
Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1
Measurement Accuracy
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
Data collection
TREC Robust Track (2004)
250 topics (topics 301-450 and topics 601-700)
Tools
Terrier ndash an open source search engine developed by University of Glasgow
BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting
Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight
qexp = qorg s1andw1 s2
andw2 skandwk
Measurement Mean Average Precision (MAP) R-precision and Recall
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonymsData collection
NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications
Select 20 strongly time-dependent queries
Measurement Precision at 10 20 and 30 retrieved documents
Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period
American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonyms
Statistics and accuracy of entity-synonym relationships extracted from Wikipedia
NER Method NE NE-Syn Avg Syn Accuracyper NE ()
BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73
BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2
BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo
Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Robust2004 query statistics
Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)
2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)
k = 2 if k gt 2 bring noise to the NERQ process
Number of queries using two different NERType MW-NERQ MRW-NERQ
Named entity 42 149Not named entity 208 101Total 250 250
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ
MAP R-precision Recall MAP R-precision Recall
PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504
PM Probabilistic Model without query expansion
PRF Pseudo Relevance Feedback using Rocchio algorithm
SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback
SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback
Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonyms
P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800
TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
QUEST Query Expansion using Synonyms over Time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest
Thank you
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
- Outline
- Main Talk
-
- Introduction
-
- Problem Statement
- Contributions
-
- Synonym Detection
-
- Entity Recognition and Synonym Extraction
- Improving the Accuracy of Time
-
- Query Expansion
-
- Time-based Synonyms
- Ranking Time-independent Synonyms
- Ranking Time-dependent Synonyms
-
- Evaluation
-
- Experiment Setting
- Experimental Results
-
- Conclusions
-
- Conclusions and Future Work
-
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Classifying synonyms into two types
DefinitionClass A time-independent
Robust to change over time and good synonym candidates for an ordinarysearch (no temporal criteria provided)
Eg ldquoBarack Hussein Obama IIrdquo is a time-independent synonym of ldquoBarackObamardquo
Class B time-dependent
Related to particular time in the past and good synonym candidates for atemporal search where changes in semantics must be considered
Eg ldquoCardinal Joseph Ratzingerrdquo is a time-dependent synonym of ldquoPopeBenedict XVIrdquo before 2005
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Overview of experiments
Our experimental evaluation is divided into three main parts
1 Extracting and improving the accuracy of time of synonyms
2 Query expansion using time-independent synonyms
3 Query expansion using time-dependent synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonymsData collection
The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)
New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007
ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725
Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1
Measurement Accuracy
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
Data collection
TREC Robust Track (2004)
250 topics (topics 301-450 and topics 601-700)
Tools
Terrier ndash an open source search engine developed by University of Glasgow
BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting
Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight
qexp = qorg s1andw1 s2
andw2 skandwk
Measurement Mean Average Precision (MAP) R-precision and Recall
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonymsData collection
NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications
Select 20 strongly time-dependent queries
Measurement Precision at 10 20 and 30 retrieved documents
Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period
American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonyms
Statistics and accuracy of entity-synonym relationships extracted from Wikipedia
NER Method NE NE-Syn Avg Syn Accuracyper NE ()
BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73
BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2
BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo
Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Robust2004 query statistics
Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)
2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)
k = 2 if k gt 2 bring noise to the NERQ process
Number of queries using two different NERType MW-NERQ MRW-NERQ
Named entity 42 149Not named entity 208 101Total 250 250
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ
MAP R-precision Recall MAP R-precision Recall
PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504
PM Probabilistic Model without query expansion
PRF Pseudo Relevance Feedback using Rocchio algorithm
SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback
SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback
Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonyms
P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800
TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
QUEST Query Expansion using Synonyms over Time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest
Thank you
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
- Outline
- Main Talk
-
- Introduction
-
- Problem Statement
- Contributions
-
- Synonym Detection
-
- Entity Recognition and Synonym Extraction
- Improving the Accuracy of Time
-
- Query Expansion
-
- Time-based Synonyms
- Ranking Time-independent Synonyms
- Ranking Time-dependent Synonyms
-
- Evaluation
-
- Experiment Setting
- Experimental Results
-
- Conclusions
-
- Conclusions and Future Work
-
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Overview of experiments
Our experimental evaluation is divided into three main parts
1 Extracting and improving the accuracy of time of synonyms
2 Query expansion using time-independent synonyms
3 Query expansion using time-dependent synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonymsData collection
The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)
New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007
ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725
Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1
Measurement Accuracy
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
Data collection
TREC Robust Track (2004)
250 topics (topics 301-450 and topics 601-700)
Tools
Terrier ndash an open source search engine developed by University of Glasgow
BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting
Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight
qexp = qorg s1andw1 s2
andw2 skandwk
Measurement Mean Average Precision (MAP) R-precision and Recall
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonymsData collection
NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications
Select 20 strongly time-dependent queries
Measurement Precision at 10 20 and 30 retrieved documents
Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period
American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonyms
Statistics and accuracy of entity-synonym relationships extracted from Wikipedia
NER Method NE NE-Syn Avg Syn Accuracyper NE ()
BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73
BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2
BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo
Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Robust2004 query statistics
Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)
2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)
k = 2 if k gt 2 bring noise to the NERQ process
Number of queries using two different NERType MW-NERQ MRW-NERQ
Named entity 42 149Not named entity 208 101Total 250 250
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ
MAP R-precision Recall MAP R-precision Recall
PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504
PM Probabilistic Model without query expansion
PRF Pseudo Relevance Feedback using Rocchio algorithm
SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback
SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback
Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonyms
P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800
TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
QUEST Query Expansion using Synonyms over Time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest
Thank you
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
- Outline
- Main Talk
-
- Introduction
-
- Problem Statement
- Contributions
-
- Synonym Detection
-
- Entity Recognition and Synonym Extraction
- Improving the Accuracy of Time
-
- Query Expansion
-
- Time-based Synonyms
- Ranking Time-independent Synonyms
- Ranking Time-dependent Synonyms
-
- Evaluation
-
- Experiment Setting
- Experimental Results
-
- Conclusions
-
- Conclusions and Future Work
-
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Overview of experiments
Our experimental evaluation is divided into three main parts
1 Extracting and improving the accuracy of time of synonyms
2 Query expansion using time-independent synonyms
3 Query expansion using time-dependent synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonymsData collection
The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)
New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007
ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725
Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1
Measurement Accuracy
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
Data collection
TREC Robust Track (2004)
250 topics (topics 301-450 and topics 601-700)
Tools
Terrier ndash an open source search engine developed by University of Glasgow
BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting
Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight
qexp = qorg s1andw1 s2
andw2 skandwk
Measurement Mean Average Precision (MAP) R-precision and Recall
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonymsData collection
NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications
Select 20 strongly time-dependent queries
Measurement Precision at 10 20 and 30 retrieved documents
Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period
American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonyms
Statistics and accuracy of entity-synonym relationships extracted from Wikipedia
NER Method NE NE-Syn Avg Syn Accuracyper NE ()
BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73
BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2
BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo
Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Robust2004 query statistics
Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)
2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)
k = 2 if k gt 2 bring noise to the NERQ process
Number of queries using two different NERType MW-NERQ MRW-NERQ
Named entity 42 149Not named entity 208 101Total 250 250
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ
MAP R-precision Recall MAP R-precision Recall
PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504
PM Probabilistic Model without query expansion
PRF Pseudo Relevance Feedback using Rocchio algorithm
SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback
SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback
Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonyms
P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800
TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
QUEST Query Expansion using Synonyms over Time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest
Thank you
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
- Outline
- Main Talk
-
- Introduction
-
- Problem Statement
- Contributions
-
- Synonym Detection
-
- Entity Recognition and Synonym Extraction
- Improving the Accuracy of Time
-
- Query Expansion
-
- Time-based Synonyms
- Ranking Time-independent Synonyms
- Ranking Time-dependent Synonyms
-
- Evaluation
-
- Experiment Setting
- Experimental Results
-
- Conclusions
-
- Conclusions and Future Work
-
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-independent synonyms
DefinitionTime-independent synonyms are weighted by a mixture model of a temporal featureand a frequency feature
TIDP(sj ) = micro middot pf (sj ) + (1minus micro) middot tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
tf (sj ) is an averaged tf of sj in all time partitions tf (sj ) =sum
i tf (sj pi )
pf (sj )
micro underlines the importance of a temporal feature and a frequency feature
micro = 05 yields the best performance in the experiments
IntuitionThe model measures popularity of synonyms based on two factors
Robustness to change over time ie the more partitions synonyms occur themore robust to time they are
High usages over time ie a high value of averaged frequencies over time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Overview of experiments
Our experimental evaluation is divided into three main parts
1 Extracting and improving the accuracy of time of synonyms
2 Query expansion using time-independent synonyms
3 Query expansion using time-dependent synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonymsData collection
The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)
New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007
ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725
Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1
Measurement Accuracy
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
Data collection
TREC Robust Track (2004)
250 topics (topics 301-450 and topics 601-700)
Tools
Terrier ndash an open source search engine developed by University of Glasgow
BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting
Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight
qexp = qorg s1andw1 s2
andw2 skandwk
Measurement Mean Average Precision (MAP) R-precision and Recall
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonymsData collection
NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications
Select 20 strongly time-dependent queries
Measurement Precision at 10 20 and 30 retrieved documents
Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period
American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonyms
Statistics and accuracy of entity-synonym relationships extracted from Wikipedia
NER Method NE NE-Syn Avg Syn Accuracyper NE ()
BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73
BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2
BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo
Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Robust2004 query statistics
Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)
2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)
k = 2 if k gt 2 bring noise to the NERQ process
Number of queries using two different NERType MW-NERQ MRW-NERQ
Named entity 42 149Not named entity 208 101Total 250 250
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ
MAP R-precision Recall MAP R-precision Recall
PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504
PM Probabilistic Model without query expansion
PRF Pseudo Relevance Feedback using Rocchio algorithm
SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback
SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback
Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonyms
P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800
TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
QUEST Query Expansion using Synonyms over Time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest
Thank you
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
- Outline
- Main Talk
-
- Introduction
-
- Problem Statement
- Contributions
-
- Synonym Detection
-
- Entity Recognition and Synonym Extraction
- Improving the Accuracy of Time
-
- Query Expansion
-
- Time-based Synonyms
- Ranking Time-independent Synonyms
- Ranking Time-dependent Synonyms
-
- Evaluation
-
- Experiment Setting
- Experimental Results
-
- Conclusions
-
- Conclusions and Future Work
-
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Overview of experiments
Our experimental evaluation is divided into three main parts
1 Extracting and improving the accuracy of time of synonyms
2 Query expansion using time-independent synonyms
3 Query expansion using time-dependent synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonymsData collection
The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)
New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007
ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725
Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1
Measurement Accuracy
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
Data collection
TREC Robust Track (2004)
250 topics (topics 301-450 and topics 601-700)
Tools
Terrier ndash an open source search engine developed by University of Glasgow
BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting
Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight
qexp = qorg s1andw1 s2
andw2 skandwk
Measurement Mean Average Precision (MAP) R-precision and Recall
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonymsData collection
NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications
Select 20 strongly time-dependent queries
Measurement Precision at 10 20 and 30 retrieved documents
Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period
American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonyms
Statistics and accuracy of entity-synonym relationships extracted from Wikipedia
NER Method NE NE-Syn Avg Syn Accuracyper NE ()
BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73
BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2
BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo
Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Robust2004 query statistics
Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)
2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)
k = 2 if k gt 2 bring noise to the NERQ process
Number of queries using two different NERType MW-NERQ MRW-NERQ
Named entity 42 149Not named entity 208 101Total 250 250
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ
MAP R-precision Recall MAP R-precision Recall
PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504
PM Probabilistic Model without query expansion
PRF Pseudo Relevance Feedback using Rocchio algorithm
SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback
SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback
Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonyms
P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800
TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
QUEST Query Expansion using Synonyms over Time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest
Thank you
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
- Outline
- Main Talk
-
- Introduction
-
- Problem Statement
- Contributions
-
- Synonym Detection
-
- Entity Recognition and Synonym Extraction
- Improving the Accuracy of Time
-
- Query Expansion
-
- Time-based Synonyms
- Ranking Time-independent Synonyms
- Ranking Time-dependent Synonyms
-
- Evaluation
-
- Experiment Setting
- Experimental Results
-
- Conclusions
-
- Conclusions and Future Work
-
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Overview of experiments
Our experimental evaluation is divided into three main parts
1 Extracting and improving the accuracy of time of synonyms
2 Query expansion using time-independent synonyms
3 Query expansion using time-dependent synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonymsData collection
The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)
New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007
ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725
Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1
Measurement Accuracy
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
Data collection
TREC Robust Track (2004)
250 topics (topics 301-450 and topics 601-700)
Tools
Terrier ndash an open source search engine developed by University of Glasgow
BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting
Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight
qexp = qorg s1andw1 s2
andw2 skandwk
Measurement Mean Average Precision (MAP) R-precision and Recall
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonymsData collection
NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications
Select 20 strongly time-dependent queries
Measurement Precision at 10 20 and 30 retrieved documents
Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period
American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonyms
Statistics and accuracy of entity-synonym relationships extracted from Wikipedia
NER Method NE NE-Syn Avg Syn Accuracyper NE ()
BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73
BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2
BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo
Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Robust2004 query statistics
Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)
2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)
k = 2 if k gt 2 bring noise to the NERQ process
Number of queries using two different NERType MW-NERQ MRW-NERQ
Named entity 42 149Not named entity 208 101Total 250 250
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ
MAP R-precision Recall MAP R-precision Recall
PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504
PM Probabilistic Model without query expansion
PRF Pseudo Relevance Feedback using Rocchio algorithm
SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback
SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback
Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonyms
P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800
TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
QUEST Query Expansion using Synonyms over Time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest
Thank you
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
- Outline
- Main Talk
-
- Introduction
-
- Problem Statement
- Contributions
-
- Synonym Detection
-
- Entity Recognition and Synonym Extraction
- Improving the Accuracy of Time
-
- Query Expansion
-
- Time-based Synonyms
- Ranking Time-independent Synonyms
- Ranking Time-dependent Synonyms
-
- Evaluation
-
- Experiment Setting
- Experimental Results
-
- Conclusions
-
- Conclusions and Future Work
-
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Time-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
Ranking time-dependent synonyms
DefinitionGiven time tk time-dependent synonyms at tk are weighted by
TDP(sj tk ) = tf (sj tk )
tf (sj tk ) is a term frequency of sj at tk
IntuitionOnly term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular timeperiod tk are interesting
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Overview of experiments
Our experimental evaluation is divided into three main parts
1 Extracting and improving the accuracy of time of synonyms
2 Query expansion using time-independent synonyms
3 Query expansion using time-dependent synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonymsData collection
The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)
New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007
ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725
Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1
Measurement Accuracy
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
Data collection
TREC Robust Track (2004)
250 topics (topics 301-450 and topics 601-700)
Tools
Terrier ndash an open source search engine developed by University of Glasgow
BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting
Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight
qexp = qorg s1andw1 s2
andw2 skandwk
Measurement Mean Average Precision (MAP) R-precision and Recall
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonymsData collection
NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications
Select 20 strongly time-dependent queries
Measurement Precision at 10 20 and 30 retrieved documents
Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period
American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonyms
Statistics and accuracy of entity-synonym relationships extracted from Wikipedia
NER Method NE NE-Syn Avg Syn Accuracyper NE ()
BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73
BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2
BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo
Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Robust2004 query statistics
Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)
2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)
k = 2 if k gt 2 bring noise to the NERQ process
Number of queries using two different NERType MW-NERQ MRW-NERQ
Named entity 42 149Not named entity 208 101Total 250 250
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ
MAP R-precision Recall MAP R-precision Recall
PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504
PM Probabilistic Model without query expansion
PRF Pseudo Relevance Feedback using Rocchio algorithm
SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback
SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback
Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonyms
P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800
TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
QUEST Query Expansion using Synonyms over Time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest
Thank you
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
- Outline
- Main Talk
-
- Introduction
-
- Problem Statement
- Contributions
-
- Synonym Detection
-
- Entity Recognition and Synonym Extraction
- Improving the Accuracy of Time
-
- Query Expansion
-
- Time-based Synonyms
- Ranking Time-independent Synonyms
- Ranking Time-dependent Synonyms
-
- Evaluation
-
- Experiment Setting
- Experimental Results
-
- Conclusions
-
- Conclusions and Future Work
-
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Overview of experiments
Our experimental evaluation is divided into three main parts
1 Extracting and improving the accuracy of time of synonyms
2 Query expansion using time-independent synonyms
3 Query expansion using time-dependent synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonymsData collection
The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)
New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007
ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725
Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1
Measurement Accuracy
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
Data collection
TREC Robust Track (2004)
250 topics (topics 301-450 and topics 601-700)
Tools
Terrier ndash an open source search engine developed by University of Glasgow
BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting
Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight
qexp = qorg s1andw1 s2
andw2 skandwk
Measurement Mean Average Precision (MAP) R-precision and Recall
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonymsData collection
NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications
Select 20 strongly time-dependent queries
Measurement Precision at 10 20 and 30 retrieved documents
Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period
American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonyms
Statistics and accuracy of entity-synonym relationships extracted from Wikipedia
NER Method NE NE-Syn Avg Syn Accuracyper NE ()
BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73
BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2
BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo
Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Robust2004 query statistics
Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)
2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)
k = 2 if k gt 2 bring noise to the NERQ process
Number of queries using two different NERType MW-NERQ MRW-NERQ
Named entity 42 149Not named entity 208 101Total 250 250
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ
MAP R-precision Recall MAP R-precision Recall
PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504
PM Probabilistic Model without query expansion
PRF Pseudo Relevance Feedback using Rocchio algorithm
SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback
SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback
Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonyms
P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800
TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
QUEST Query Expansion using Synonyms over Time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest
Thank you
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
- Outline
- Main Talk
-
- Introduction
-
- Problem Statement
- Contributions
-
- Synonym Detection
-
- Entity Recognition and Synonym Extraction
- Improving the Accuracy of Time
-
- Query Expansion
-
- Time-based Synonyms
- Ranking Time-independent Synonyms
- Ranking Time-dependent Synonyms
-
- Evaluation
-
- Experiment Setting
- Experimental Results
-
- Conclusions
-
- Conclusions and Future Work
-
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Overview of experiments
Our experimental evaluation is divided into three main parts
1 Extracting and improving the accuracy of time of synonyms
2 Query expansion using time-independent synonyms
3 Query expansion using time-dependent synonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonymsData collection
The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)
New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007
ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725
Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1
Measurement Accuracy
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
Data collection
TREC Robust Track (2004)
250 topics (topics 301-450 and topics 601-700)
Tools
Terrier ndash an open source search engine developed by University of Glasgow
BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting
Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight
qexp = qorg s1andw1 s2
andw2 skandwk
Measurement Mean Average Precision (MAP) R-precision and Recall
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonymsData collection
NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications
Select 20 strongly time-dependent queries
Measurement Precision at 10 20 and 30 retrieved documents
Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period
American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonyms
Statistics and accuracy of entity-synonym relationships extracted from Wikipedia
NER Method NE NE-Syn Avg Syn Accuracyper NE ()
BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73
BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2
BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo
Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Robust2004 query statistics
Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)
2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)
k = 2 if k gt 2 bring noise to the NERQ process
Number of queries using two different NERType MW-NERQ MRW-NERQ
Named entity 42 149Not named entity 208 101Total 250 250
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ
MAP R-precision Recall MAP R-precision Recall
PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504
PM Probabilistic Model without query expansion
PRF Pseudo Relevance Feedback using Rocchio algorithm
SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback
SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback
Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonyms
P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800
TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
QUEST Query Expansion using Synonyms over Time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest
Thank you
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
- Outline
- Main Talk
-
- Introduction
-
- Problem Statement
- Contributions
-
- Synonym Detection
-
- Entity Recognition and Synonym Extraction
- Improving the Accuracy of Time
-
- Query Expansion
-
- Time-based Synonyms
- Ranking Time-independent Synonyms
- Ranking Time-dependent Synonyms
-
- Evaluation
-
- Experiment Setting
- Experimental Results
-
- Conclusions
-
- Conclusions and Future Work
-
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonymsData collection
The whole history of English WikipediaAll pages and revisions 032001 to 032008 ndash 85 snapshots (0103200101022001 01032008) about 28 Terabytes4 additional snapshots (24052008 27072008 08102008 06032009)
New York Time Annotated Corpus contains over 18 million articles from January1987 to June 2007
ToolsMWDumper httpwwwmediawikiorgwikiMwdumperOracle Berkeley DB version 4725
Burst detection algorithm implemented by KleinbergNumber of states 2Ratio of rate of second state to base state 2Ratio of rate of each subsequent state to previous state 2Gamma parameter of the HMM 1
Measurement Accuracy
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
Data collection
TREC Robust Track (2004)
250 topics (topics 301-450 and topics 601-700)
Tools
Terrier ndash an open source search engine developed by University of Glasgow
BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting
Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight
qexp = qorg s1andw1 s2
andw2 skandwk
Measurement Mean Average Precision (MAP) R-precision and Recall
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonymsData collection
NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications
Select 20 strongly time-dependent queries
Measurement Precision at 10 20 and 30 retrieved documents
Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period
American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonyms
Statistics and accuracy of entity-synonym relationships extracted from Wikipedia
NER Method NE NE-Syn Avg Syn Accuracyper NE ()
BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73
BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2
BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo
Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Robust2004 query statistics
Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)
2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)
k = 2 if k gt 2 bring noise to the NERQ process
Number of queries using two different NERType MW-NERQ MRW-NERQ
Named entity 42 149Not named entity 208 101Total 250 250
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ
MAP R-precision Recall MAP R-precision Recall
PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504
PM Probabilistic Model without query expansion
PRF Pseudo Relevance Feedback using Rocchio algorithm
SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback
SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback
Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonyms
P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800
TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
QUEST Query Expansion using Synonyms over Time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest
Thank you
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
- Outline
- Main Talk
-
- Introduction
-
- Problem Statement
- Contributions
-
- Synonym Detection
-
- Entity Recognition and Synonym Extraction
- Improving the Accuracy of Time
-
- Query Expansion
-
- Time-based Synonyms
- Ranking Time-independent Synonyms
- Ranking Time-dependent Synonyms
-
- Evaluation
-
- Experiment Setting
- Experimental Results
-
- Conclusions
-
- Conclusions and Future Work
-
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
Data collection
TREC Robust Track (2004)
250 topics (topics 301-450 and topics 601-700)
Tools
Terrier ndash an open source search engine developed by University of Glasgow
BM25 probabilistic model with Generic Divergence From Randomness (DFR)weighting
Expand the top-k synonyms s1 sk plus TIDP scores as boosting weight
qexp = qorg s1andw1 s2
andw2 skandwk
Measurement Mean Average Precision (MAP) R-precision and Recall
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonymsData collection
NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications
Select 20 strongly time-dependent queries
Measurement Precision at 10 20 and 30 retrieved documents
Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period
American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonyms
Statistics and accuracy of entity-synonym relationships extracted from Wikipedia
NER Method NE NE-Syn Avg Syn Accuracyper NE ()
BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73
BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2
BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo
Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Robust2004 query statistics
Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)
2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)
k = 2 if k gt 2 bring noise to the NERQ process
Number of queries using two different NERType MW-NERQ MRW-NERQ
Named entity 42 149Not named entity 208 101Total 250 250
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ
MAP R-precision Recall MAP R-precision Recall
PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504
PM Probabilistic Model without query expansion
PRF Pseudo Relevance Feedback using Rocchio algorithm
SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback
SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback
Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonyms
P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800
TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
QUEST Query Expansion using Synonyms over Time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest
Thank you
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
- Outline
- Main Talk
-
- Introduction
-
- Problem Statement
- Contributions
-
- Synonym Detection
-
- Entity Recognition and Synonym Extraction
- Improving the Accuracy of Time
-
- Query Expansion
-
- Time-based Synonyms
- Ranking Time-independent Synonyms
- Ranking Time-dependent Synonyms
-
- Evaluation
-
- Experiment Setting
- Experimental Results
-
- Conclusions
-
- Conclusions and Future Work
-
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonymsData collection
NewsLibrarycom contains more than 182 million newspaper articles fromthousands of credible US publications
Select 20 strongly time-dependent queries
Measurement Precision at 10 20 and 30 retrieved documents
Examples of temporal queriesTemporal Query SynonymNamed Entity Time Period
American Broadcasting Company 1995-2000 DisneyABCBarack Obama 2005-2007 Senator ObamaEminem 1999-2004 Slim ShadyGeorge H W Bush 1988-1992 President George HW BushGeorge W Bush 2000-2007 President George W BushHillary Rodham Clinton 2001-2007 Senator ClintonKmart 1987-1987 KresgePope Benedict XVI 1988-2005 Cardinal RatzingerRonald Reagan 1987-1989 Reagan RevolutionVirgin Media 1999-2002 Telewest Communications
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonyms
Statistics and accuracy of entity-synonym relationships extracted from Wikipedia
NER Method NE NE-Syn Avg Syn Accuracyper NE ()
BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73
BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2
BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo
Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Robust2004 query statistics
Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)
2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)
k = 2 if k gt 2 bring noise to the NERQ process
Number of queries using two different NERType MW-NERQ MRW-NERQ
Named entity 42 149Not named entity 208 101Total 250 250
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ
MAP R-precision Recall MAP R-precision Recall
PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504
PM Probabilistic Model without query expansion
PRF Pseudo Relevance Feedback using Rocchio algorithm
SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback
SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback
Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonyms
P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800
TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
QUEST Query Expansion using Synonyms over Time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest
Thank you
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
- Outline
- Main Talk
-
- Introduction
-
- Problem Statement
- Contributions
-
- Synonym Detection
-
- Entity Recognition and Synonym Extraction
- Improving the Accuracy of Time
-
- Query Expansion
-
- Time-based Synonyms
- Ranking Time-independent Synonyms
- Ranking Time-dependent Synonyms
-
- Evaluation
-
- Experiment Setting
- Experimental Results
-
- Conclusions
-
- Conclusions and Future Work
-
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonyms
Statistics and accuracy of entity-synonym relationships extracted from Wikipedia
NER Method NE NE-Syn Avg Syn Accuracyper NE ()
BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73
BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2
BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo
Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Robust2004 query statistics
Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)
2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)
k = 2 if k gt 2 bring noise to the NERQ process
Number of queries using two different NERType MW-NERQ MRW-NERQ
Named entity 42 149Not named entity 208 101Total 250 250
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ
MAP R-precision Recall MAP R-precision Recall
PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504
PM Probabilistic Model without query expansion
PRF Pseudo Relevance Feedback using Rocchio algorithm
SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback
SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback
Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonyms
P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800
TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
QUEST Query Expansion using Synonyms over Time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest
Thank you
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
- Outline
- Main Talk
-
- Introduction
-
- Problem Statement
- Contributions
-
- Synonym Detection
-
- Entity Recognition and Synonym Extraction
- Improving the Accuracy of Time
-
- Query Expansion
-
- Time-based Synonyms
- Ranking Time-independent Synonyms
- Ranking Time-dependent Synonyms
-
- Evaluation
-
- Experiment Setting
- Experimental Results
-
- Conclusions
-
- Conclusions and Future Work
-
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Extracting and improving time of synonyms
Statistics and accuracy of entity-synonym relationships extracted from Wikipedia
NER Method NE NE-Syn Avg Syn Accuracyper NE ()
BPF-NERW 2574319 3199115 12 51BPCF-NERW 473829 488383 10 73
BPF-NERW Bunescu and Pascarsquos Named Entity Recognition of Wikipedia titles with Filtering criteria1) time interval lt 6 months and 2) average frequency lt 2
BPCF-NERW BPF-NERW with only the Categories of ldquopeoplerdquo ldquoorganizationrdquo or ldquocompanyrdquo
Note Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Robust2004 query statistics
Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)
2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)
k = 2 if k gt 2 bring noise to the NERQ process
Number of queries using two different NERType MW-NERQ MRW-NERQ
Named entity 42 149Not named entity 208 101Total 250 250
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ
MAP R-precision Recall MAP R-precision Recall
PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504
PM Probabilistic Model without query expansion
PRF Pseudo Relevance Feedback using Rocchio algorithm
SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback
SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback
Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonyms
P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800
TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
QUEST Query Expansion using Synonyms over Time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest
Thank you
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
- Outline
- Main Talk
-
- Introduction
-
- Problem Statement
- Contributions
-
- Synonym Detection
-
- Entity Recognition and Synonym Extraction
- Improving the Accuracy of Time
-
- Query Expansion
-
- Time-based Synonyms
- Ranking Time-independent Synonyms
- Ranking Time-dependent Synonyms
-
- Evaluation
-
- Experiment Setting
- Experimental Results
-
- Conclusions
-
- Conclusions and Future Work
-
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Robust2004 query statistics
Two methods for recognizing named entities in queries1 Exactly matched Wikipedia page (MW-NERQ)
2 Exactly matched Wikipedia page and top-k related Wikipedia pages(MRW-NERQ)
k = 2 if k gt 2 bring noise to the NERQ process
Number of queries using two different NERType MW-NERQ MRW-NERQ
Named entity 42 149Not named entity 208 101Total 250 250
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ
MAP R-precision Recall MAP R-precision Recall
PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504
PM Probabilistic Model without query expansion
PRF Pseudo Relevance Feedback using Rocchio algorithm
SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback
SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback
Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonyms
P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800
TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
QUEST Query Expansion using Synonyms over Time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest
Thank you
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
- Outline
- Main Talk
-
- Introduction
-
- Problem Statement
- Contributions
-
- Synonym Detection
-
- Entity Recognition and Synonym Extraction
- Improving the Accuracy of Time
-
- Query Expansion
-
- Time-based Synonyms
- Ranking Time-independent Synonyms
- Ranking Time-dependent Synonyms
-
- Evaluation
-
- Experiment Setting
- Experimental Results
-
- Conclusions
-
- Conclusions and Future Work
-
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-independent synonyms
MAP R-precision and Recall ( indicates statistically significant at p lt 005)Method MW-NERQ MRW-NERQ
MAP R-precision Recall MAP R-precision Recall
PM 2889 3309 6185 2455 2904 5629PRF 3469 3711 6944 3002 3227 6761SQE-PRF 3608 3652 7405 2507 2665 5932SWQE-PRF 3653 3861 7388 2885 3080 6504
PM Probabilistic Model without query expansion
PRF Pseudo Relevance Feedback using Rocchio algorithm
SQE-PRF Top-k Synonyms Query Expansion with Pseudo Relevant Feedback
SWQE-PRF Top-k Synonyms TIDP-Weighted Query Expansion with Pseudo Relevant Feedback
Note 40 expansion terms top-10 retrieved documents DFR term weighting model ie Bose-Einstein 1
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonyms
P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800
TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
QUEST Query Expansion using Synonyms over Time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest
Thank you
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
- Outline
- Main Talk
-
- Introduction
-
- Problem Statement
- Contributions
-
- Synonym Detection
-
- Entity Recognition and Synonym Extraction
- Improving the Accuracy of Time
-
- Query Expansion
-
- Time-based Synonyms
- Ranking Time-independent Synonyms
- Ranking Time-dependent Synonyms
-
- Evaluation
-
- Experiment Setting
- Experimental Results
-
- Conclusions
-
- Conclusions and Future Work
-
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
Query expansion using time-dependent synonyms
P10 P20 and P30 ( indicates statistically significant at p lt 005)Method P10 P20 P30TQ 1000 0500 0333TSQ 5200 3800 2800
TQ search a Temporal Query ie a keyword wq and time tqTSQ search a Temporal Query and expand with Synonyms wrt time tq
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
QUEST Query Expansion using Synonyms over Time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest
Thank you
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
- Outline
- Main Talk
-
- Introduction
-
- Problem Statement
- Contributions
-
- Synonym Detection
-
- Entity Recognition and Synonym Extraction
- Improving the Accuracy of Time
-
- Query Expansion
-
- Time-based Synonyms
- Ranking Time-independent Synonyms
- Ranking Time-dependent Synonyms
-
- Evaluation
-
- Experiment Setting
- Experimental Results
-
- Conclusions
-
- Conclusions and Future Work
-
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Experiment SettingExperimental Results
QUEST Query Expansion using Synonyms over Time
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest
Thank you
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
- Outline
- Main Talk
-
- Introduction
-
- Problem Statement
- Contributions
-
- Synonym Detection
-
- Entity Recognition and Synonym Extraction
- Improving the Accuracy of Time
-
- Query Expansion
-
- Time-based Synonyms
- Ranking Time-independent Synonyms
- Ranking Time-dependent Synonyms
-
- Evaluation
-
- Experiment Setting
- Experimental Results
-
- Conclusions
-
- Conclusions and Future Work
-
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Outline1 Introduction
Problem StatementContributions
2 Synonym DetectionEntity Recognition and Synonym ExtractionImproving the Accuracy of Time
3 Query ExpansionTime-based SynonymsRanking Time-independent SynonymsRanking Time-dependent Synonyms
4 EvaluationExperiment SettingExperimental Results
5 ConclusionsConclusions and Future Work
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest
Thank you
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
- Outline
- Main Talk
-
- Introduction
-
- Problem Statement
- Contributions
-
- Synonym Detection
-
- Entity Recognition and Synonym Extraction
- Improving the Accuracy of Time
-
- Query Expansion
-
- Time-based Synonyms
- Ranking Time-independent Synonyms
- Ranking Time-dependent Synonyms
-
- Evaluation
-
- Experiment Setting
- Experimental Results
-
- Conclusions
-
- Conclusions and Future Work
-
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest
Thank you
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
- Outline
- Main Talk
-
- Introduction
-
- Problem Statement
- Contributions
-
- Synonym Detection
-
- Entity Recognition and Synonym Extraction
- Improving the Accuracy of Time
-
- Query Expansion
-
- Time-based Synonyms
- Ranking Time-independent Synonyms
- Ranking Time-dependent Synonyms
-
- Evaluation
-
- Experiment Setting
- Experimental Results
-
- Conclusions
-
- Conclusions and Future Work
-
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest
Thank you
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
- Outline
- Main Talk
-
- Introduction
-
- Problem Statement
- Contributions
-
- Synonym Detection
-
- Entity Recognition and Synonym Extraction
- Improving the Accuracy of Time
-
- Query Expansion
-
- Time-based Synonyms
- Ranking Time-independent Synonyms
- Ranking Time-dependent Synonyms
-
- Evaluation
-
- Experiment Setting
- Experimental Results
-
- Conclusions
-
- Conclusions and Future Work
-
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest
Thank you
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
- Outline
- Main Talk
-
- Introduction
-
- Problem Statement
- Contributions
-
- Synonym Detection
-
- Entity Recognition and Synonym Extraction
- Improving the Accuracy of Time
-
- Query Expansion
-
- Time-based Synonyms
- Ranking Time-independent Synonyms
- Ranking Time-dependent Synonyms
-
- Evaluation
-
- Experiment Setting
- Experimental Results
-
- Conclusions
-
- Conclusions and Future Work
-
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest
Thank you
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
- Outline
- Main Talk
-
- Introduction
-
- Problem Statement
- Contributions
-
- Synonym Detection
-
- Entity Recognition and Synonym Extraction
- Improving the Accuracy of Time
-
- Query Expansion
-
- Time-based Synonyms
- Ranking Time-independent Synonyms
- Ranking Time-dependent Synonyms
-
- Evaluation
-
- Experiment Setting
- Experimental Results
-
- Conclusions
-
- Conclusions and Future Work
-
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest
Thank you
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
- Outline
- Main Talk
-
- Introduction
-
- Problem Statement
- Contributions
-
- Synonym Detection
-
- Entity Recognition and Synonym Extraction
- Improving the Accuracy of Time
-
- Query Expansion
-
- Time-based Synonyms
- Ranking Time-independent Synonyms
- Ranking Time-dependent Synonyms
-
- Evaluation
-
- Experiment Setting
- Experimental Results
-
- Conclusions
-
- Conclusions and Future Work
-
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest
Thank you
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
- Outline
- Main Talk
-
- Introduction
-
- Problem Statement
- Contributions
-
- Synonym Detection
-
- Entity Recognition and Synonym Extraction
- Improving the Accuracy of Time
-
- Query Expansion
-
- Time-based Synonyms
- Ranking Time-independent Synonyms
- Ranking Time-dependent Synonyms
-
- Evaluation
-
- Experiment Setting
- Experimental Results
-
- Conclusions
-
- Conclusions and Future Work
-
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from WikipediaImprove time of synonyms using NYTPerform query expansion using the time-based synonymsConduct extensive experiments showing significantincrease in retrieval effectivenessFuture work
Combine time-dependent synonyms and temporallanguage models to determine time of queriesExploit temporal information extraction techniques todiscover synonyms at particular time pointsImprove temporal text miningclustering using time-basedsynonyms
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest
Thank you
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
- Outline
- Main Talk
-
- Introduction
-
- Problem Statement
- Contributions
-
- Synonym Detection
-
- Entity Recognition and Synonym Extraction
- Improving the Accuracy of Time
-
- Query Expansion
-
- Time-based Synonyms
- Ranking Time-independent Synonyms
- Ranking Time-dependent Synonyms
-
- Evaluation
-
- Experiment Setting
- Experimental Results
-
- Conclusions
-
- Conclusions and Future Work
-
IntroductionSynonym Detection
Query ExpansionEvaluation
Conclusions
Conclusions and Future Work
QUEST Query Expansion using Synonyms over Timehttpresearchidintnunowislabquest
Thank you
Kanhabua and Noslashrvaringg Exploiting Time-based Synonyms in Search
- Outline
- Main Talk
-
- Introduction
-
- Problem Statement
- Contributions
-
- Synonym Detection
-
- Entity Recognition and Synonym Extraction
- Improving the Accuracy of Time
-
- Query Expansion
-
- Time-based Synonyms
- Ranking Time-independent Synonyms
- Ranking Time-dependent Synonyms
-
- Evaluation
-
- Experiment Setting
- Experimental Results
-
- Conclusions
-
- Conclusions and Future Work
-
top related