the semeval-2007 web people search evaluation the semeval-2007 web people search evaluatin javier...
Post on 24-Dec-2015
218 Views
Preview:
TRANSCRIPT
The SemEval-2007 Web People Search EvaluationThe SemEval-2007 Web People Search Evaluatin
Javier Artiles, Julio Gonzalo, Satoshi Sekine The SemEval-2007 Web People Search Evaluatin
The SemEval-2007 WePS EvaluationEstablishing a Benchmark for the Web People Search Task
Javier Artiles, Julio Gonzalo, Satoshi Sekine
UNED NLP & IR GroupMadrid, Spain
nlp.uned.es/~{javier, julio}
CS DepartmentNew York University, USAnlp.cs.nyu.edu/sekine
Aarhus, 19 Sep 2008
The SemEval-2007 Web People Search EvaluationThe SemEval-2007 Web People Search Evaluatin
Javier Artiles, Julio Gonzalo, Satoshi Sekine The SemEval-2007 Web People Search Evaluation
The WePS Task
The Web People Search problem
The SemEval-2007 Web People Search EvaluationThe SemEval-2007 Web People Search Evaluatin
Javier Artiles, Julio Gonzalo, Satoshi Sekine The SemEval-2007 Web People Search Evaluation
The WePS Task
The WePS 1 Task
Input: first 100 results for a person name search
Output: clustering according to the actual people
John Smith 1 (Captain)
Captain John Smith - www.apva.org
John Smith Wikipedia - en.wikipedia.org/wiki…
…
John Smith 2 (Labour leader)
BBC: Labour leader John Smith – news.bbc.co.uk…
John Smith Wikipedia - en.wikipedia.org/wiki…
John Smith 3 (IBM researcher)
John Smith 4 (Film director)
John Smith 5 (Shoe company)
John Smith 6 (Writer)
…
The SemEval-2007 Web People Search EvaluationThe SemEval-2007 Web People Search Evaluatin
Javier Artiles, Julio Gonzalo, Satoshi Sekine The SemEval-2007 Web People Search Evaluation
The WePS Task
The WePS Task
Person names are a frequent type of search in the Web Approx. 30% of queries to Web search engines include a p.n.
But names can be very ambiguous.90,000 names are shared by 100 million people according to the U.S. Census Bureau.
We can find:
– High ambiguity (e.g. 82 different people in 100 pages that mention “Martha Edwards”)
– Monopolized names (e.g. +100 top results for the search “Scarlett Johansson” only mention the famous actress)
Final task with a clear application.
The SemEval-2007 Web People Search EvaluationThe SemEval-2007 Web People Search Evaluatin
Javier Artiles, Julio Gonzalo, Satoshi Sekine The SemEval-2007 Web People Search Evaluation
The WePS TaskWhy the WePS Task ?
Why the WePS Task ?
Connections with traditional WSD, but also some exciting differences:
– Unknown number of “senses” (sense discrimination)– Much more average ambiguity…– … but sharper boundaries between senses.– A document might refer to different people with the same
ambiguous name (multiclass problem).
Receiving increasing attention from IR/IE research community and companies:– ZoomInfo people search engine (www.zoominfo.com).– Spock (www.spock.com).
The SemEval-2007 Web People Search EvaluationThe SemEval-2007 Web People Search Evaluatin
Javier Artiles, Julio Gonzalo, Satoshi Sekine The SemEval-2007 Web People Search Evaluation
The WePS TaskWhy the WePS Task ?
Also a relevant multilingual task!
Connections with traditional WSD, but also some exciting differences:
– Unknown number of “senses” (sense discrimination)– Much more average ambiguity…– … but sharper boundaries between senses.– A document might refer to different people with the same
ambiguous name (multiclass problem).
Receiving increasing attention from IR/IE research community and companies:– ZoomInfo people search engine (www.zoominfo.com).– Spock started a similar challenge just a few months ago
(www.spock.com).
The SemEval-2007 Web People Search EvaluationThe SemEval-2007 Web People Search Evaluatin
Javier Artiles, Julio Gonzalo, Satoshi Sekine The SemEval-2007 Web People Search Evaluation
The WePS TaskWhy the WePS Task ?
ObjectivesData
Data: training and test datasets
name sources av. entities av. documents name sources av. entities av. documents
Wikipedia 23,14 99,00 Wikipedia 56,50 99,30
ECDL06 15,30 99,20 ACL06 31,00 98,40
WEB03 * 5,90 47,20 Census 50,30 99,10
total av. 10,76 71,02 total av. 45,93 98,93
Training Test
Random selection of names.
Different sources (Wikipedia, US Census, CS conferences).
For each person name retrieve at most the top 100 documents (Yahoo! API).
Manual clustering of each set of documents.
* Gideon S. Mann, "Multidocument Statistical Fact Extraction and Fusion",
2006, Johns Hopkins University.
The SemEval-2007 Web People Search EvaluationThe SemEval-2007 Web People Search Evaluatin
Javier Artiles, Julio Gonzalo, Satoshi Sekine The SemEval-2007 Web People Search Evaluation
The WePS TaskWhy the WePS Task ?
ObjectivesData
Data: training and test datasets
Different name sources should provide different ambiguity scenarios.
But we found… High and unpredictable variability across test cases This affected the balance between training and test. And added an (unintentional) challenge for systems.
name sources av. entities av. documents name sources av. entities av. documents
Wikipedia 23,14 99,00 Wikipedia 56,50 99,30
ECDL06 15,30 99,20 ACL06 31,00 98,40
WEB03 * 5,90 47,20 Census 50,30 99,10
total av. 10,76 71,02 total av. 45,93 98,93
Training Test
* Gideon S. Mann, "Multidocument Statistical Fact Extraction and Fusion",
2006, Johns Hopkins University.
The SemEval-2007 Web People Search EvaluationThe SemEval-2007 Web People Search Evaluatin
Javier Artiles, Julio Gonzalo, Satoshi Sekine The SemEval-2007 Web People Search Evaluation
The WePS TaskWhy the WePS Task ?
ObjectivesEvaluation measures and Baselines
Evaluation measures and Baselines
Purity: rewards less noise in each cluster.
Inverse Purity: rewards elements of a category grouped
F-measure =0,5 : harmonic mean of Purity and Inverse Purity.
Scattered
P: 1,00IP: 0,48F
0,5: 0,65
1 2
34
56
Joined
P: 0,50 IP: 1,00F0,5: 0,67
1
23
4 5
6
Combined
P: 0,75IP: 1,00F0,5: 0,86
1 12 2
3 34 4
6 65 5
The SemEval-2007 Web People Search EvaluationThe SemEval-2007 Web People Search Evaluatin
Javier Artiles, Julio Gonzalo, Satoshi Sekine The SemEval-2007 Web People Search Evaluation
The WePS TaskWhy the WePS Task ?
ObjectivesEvaluation measures and Baselines
Results
16 groups submitted (largest single task at Semeval)
The SemEval-2007 Web People Search EvaluationThe SemEval-2007 Web People Search Evaluatin
Javier Artiles, Julio Gonzalo, Satoshi Sekine The SemEval-2007 Web People Search Evaluation
Other issues
Current standard clustering evaluation measures can be cheated (see combined baseline).
Adapted B-Cubed measure.Enrique Amigó, Julio Gonzalo and Javier Artiles (2007), Evaluation metrics for clustering tasks: a comparison based on formal constraints.http://nlp.uned.es/docs/amigo2007a.pdf
Inter-annotator agreement?
Double annotation of WePS test data.
No significant swaps in the ranking.
The WePS TaskWhy the WePS Task ?
ObjectivesEvaluation measures and Baselines
Results
The SemEval-2007 Web People Search EvaluationThe SemEval-2007 Web People Search Evaluatin
Javier Artiles, Julio Gonzalo, Satoshi Sekine The SemEval-2007 Web People Search Evaluation
WePS 2
Clustering task (group documents by person) + Information Extraction task (extract person attributes)
Workshop in April 2009 (together with WWW 2009), in Madrid.
More info: http://nlp.uned.es/weps
The WePS TaskWhy the WePS Task ?
ObjectivesEvaluation measures and Baselines
Results
top related