Transcript
Page 1: OSS 2013 - Real World Facets with Entity Resolution by Benson Margulies

Real World Facets with Entity Resolution

Benson  Margulies  CTO  

Basis  Technology  

Page 2: OSS 2013 - Real World Facets with Entity Resolution by Benson Margulies

The analyst’s dilemma

 §  Andy’s  job  is  to  analyze  interac;ons  between  European  countries  and  Syria.      

§  In  par;cular,  Andy  wants  to  find  unusual  events  on  the  Internet  that  he  can  report  to  his  boss.  

Government  Analyst  Andy    

2  

Page 3: OSS 2013 - Real World Facets with Entity Resolution by Benson Margulies

Haystack of raw data

§  The  problem?  Searching  mul;ple  messy  datasets  with  both  structured  and  unstructured  data.    

§  A  common  solu;on?  Use  complex,  string-­‐based  queries  to  to  try  and  “structurize”  messy  data  

3  

Page 4: OSS 2013 - Real World Facets with Entity Resolution by Benson Margulies

Analyst  as  Google  keyword  search  pro  

4  

Page 5: OSS 2013 - Real World Facets with Entity Resolution by Benson Margulies

The  lim  of  keyword  search

5  

The  limita;on  of  Google  is  that  it  is    engaged  with  strings,  not  things.    

Page 6: OSS 2013 - Real World Facets with Entity Resolution by Benson Margulies

What is Andy really looking for?

6  

Co-­‐occurence  

People Locations Organizations Date & Time Language

Page 7: OSS 2013 - Real World Facets with Entity Resolution by Benson Margulies

Entity resolution can help Andy…

§  Find  references  to  REAL  THINGS  (people,  places…)  

§  Know  that  all  men;ons  of  SYRIA  are  one  en;ty  

§  Reference  a  master  dataset  to  resolve  en;;es,  connec;ng  names  to  knowledge  sources  

§  Ul;mately,  Andy  can  spend  more  ;me  reading  the  right  documents  

7  

Page 8: OSS 2013 - Real World Facets with Entity Resolution by Benson Margulies

Facet search by location [Syria]

8  

Page 9: OSS 2013 - Real World Facets with Entity Resolution by Benson Margulies

Filter by time range [Sept.  26-­‐27,  2013]

9  

Page 10: OSS 2013 - Real World Facets with Entity Resolution by Benson Margulies

Filter by person of interest [Laurent  Fabius]

10  

Page 11: OSS 2013 - Real World Facets with Entity Resolution by Benson Margulies

What’s Hard About All This?

11  

•  Who goes with whom?

•  Many John Smiths, and no one can spell Ghadaffi.

•  What happens when new things appear?

•  …and its implications for scale.

•  What happens when the system is rwong?

•  This system that makes decisions that stick around.

Page 12: OSS 2013 - Real World Facets with Entity Resolution by Benson Margulies

Addressing ambiguity  

12  

Page 13: OSS 2013 - Real World Facets with Entity Resolution by Benson Margulies

Addressing variety  

13  

Page 14: OSS 2013 - Real World Facets with Entity Resolution by Benson Margulies

The shock of the new

14  

•  Starting point: digest a knowledge base, match new arrivals.

•  For example, Wikipedia

•  Here comes someone new, we don’t want to:

•  Decide that Jones Smyth is John Smith

•  Decide that Jones Smtyh is different from Jones Smyth

•  … with more limited evidence

•  Relationship to scale

•  Now we have a data structure that gets modified

Page 15: OSS 2013 - Real World Facets with Entity Resolution by Benson Margulies

Evaluation / Confidence

15  

Page 16: OSS 2013 - Real World Facets with Entity Resolution by Benson Margulies

Human Correction

16  

Page 17: OSS 2013 - Real World Facets with Entity Resolution by Benson Margulies

17  

Thank  you.  


Top Related