oss 2013 - real world facets with entity resolution by benson margulies
Post on 06-Jul-2015
257 Views
Preview:
DESCRIPTION
TRANSCRIPT
Real World Facets with Entity Resolution
Benson Margulies CTO
Basis Technology
The analyst’s dilemma
§ Andy’s job is to analyze interac;ons between European countries and Syria.
§ In par;cular, Andy wants to find unusual events on the Internet that he can report to his boss.
Government Analyst Andy
2
Haystack of raw data
§ The problem? Searching mul;ple messy datasets with both structured and unstructured data.
§ A common solu;on? Use complex, string-‐based queries to to try and “structurize” messy data
3
Analyst as Google keyword search pro
4
The lim of keyword search
5
The limita;on of Google is that it is engaged with strings, not things.
What is Andy really looking for?
6
Co-‐occurence
People Locations Organizations Date & Time Language
Entity resolution can help Andy…
§ Find references to REAL THINGS (people, places…)
§ Know that all men;ons of SYRIA are one en;ty
§ Reference a master dataset to resolve en;;es, connec;ng names to knowledge sources
§ Ul;mately, Andy can spend more ;me reading the right documents
7
Facet search by location [Syria]
8
Filter by time range [Sept. 26-‐27, 2013]
9
Filter by person of interest [Laurent Fabius]
10
What’s Hard About All This?
11
• Who goes with whom?
• Many John Smiths, and no one can spell Ghadaffi.
• What happens when new things appear?
• …and its implications for scale.
• What happens when the system is rwong?
• This system that makes decisions that stick around.
Addressing ambiguity
12
Addressing variety
13
The shock of the new
14
• Starting point: digest a knowledge base, match new arrivals.
• For example, Wikipedia
• Here comes someone new, we don’t want to:
• Decide that Jones Smyth is John Smith
• Decide that Jones Smtyh is different from Jones Smyth
• … with more limited evidence
• Relationship to scale
• Now we have a data structure that gets modified
Evaluation / Confidence
15
Human Correction
16
17
Thank you.
top related