search as communication: lessons from a personal journey

41
Search as Communica/on: Lessons from a Personal Journey Daniel Tunkelang Head of Query Understanding, LinkedIn

Upload: daniel-tunkelang

Post on 09-May-2015

8.688 views

Category:

Technology


1 download

DESCRIPTION

Search as Communication: Lessons from a Personal Journey by Daniel Tunkelang (Head of Query Understanding, LinkedIn) Presented at Etsy's Code as Craft Series on May 21, 2013 When I tell people I spent a decade studying computer science at MIT and CMU, most assume that I focused my studies in information retrieval — after all, I’ve spent most of my professional life working on search. But that’s not how it happened. I learned about information extraction as a summer intern at IBM Research, where I worked on visual query reformulation. I learned how search engines work by building one at Endeca. It was only after I’d hacked my way through the problem for a few years that I started to catch up on the rich scholarly literature of the past few decades. As a result, I developed a point of view about search without the benefit of academic conventional wisdom. Specifically, I came to see search not so much as a ranking problem as a communication problem. In this talk, I’ll explain my communication-centric view of search, offering examples, general techniques, and open problems. -- Daniel Tunkelang is Head of Query Understanding at LinkedIn. Educated at MIT and CMU, he has his career working on big data, addressing key challenges in search, data mining, user interfaces, and network analysis. He co-founded enterprise search and business intelligence pioneer Endeca, where he spent a decade as its Chief Scientist. In 2011, Endeca was acquired by Oracle for over $1B. Previous to LinkedIn, he led a team at Google working on local search quality. Daniel has authored fifteen patents, written a textbook on faceted search, and created the annual symposium on human-computer interaction and information retrieval.

TRANSCRIPT

Page 1: Search as Communication: Lessons from a Personal Journey

Search  as  Communica/on:  Lessons  from  a  Personal  Journey  

Daniel  Tunkelang  Head  of  Query  Understanding,  LinkedIn  

Page 2: Search as Communication: Lessons from a Personal Journey

These  are  great  textbooks  on  informa/on  retrieval.  

Page 3: Search as Communication: Lessons from a Personal Journey

Unfortunately,  I  never  read  them  in  school.  

But  I  did  study  graphs  and  stuff.    

Page 4: Search as Communication: Lessons from a Personal Journey

I  found  myself  developing  a  search  engine.  

Page 5: Search as Communication: Lessons from a Personal Journey

And  the  next  thing  I  knew,  I  was  a  search  guy.  

Page 6: Search as Communication: Lessons from a Personal Journey

So  what  did  I  learn  along  the  way?  

Page 7: Search as Communication: Lessons from a Personal Journey

Search  isn't  a  ranking  problem.  It's  a  communica/on  problem.  

Page 8: Search as Communication: Lessons from a Personal Journey

Outline  

1.  Lessons  from  Library  Science    2.  Adventures  with  InformaAon  ExtracAon    3.  A  Moment  of  Clarity  

Page 9: Search as Communication: Lessons from a Personal Journey

1.  Lessons  from  Library  Science  

Page 10: Search as Communication: Lessons from a Personal Journey

InformaAon  need   query   select  from  results  

rank  using  IR  model  

USER:  

SYSTEM:  M-­‐idf   PageRank  

A  birds-­‐eye  view  of  how  search  engines  work.  

Page 11: Search as Communication: Lessons from a Personal Journey

Old  school  search:  ask  a  librarian.  

Page 12: Search as Communication: Lessons from a Personal Journey

Search  lives  in  an  informa/on-­‐seeking  context.    

[Pirolli  and  Card,  2005]  

Page 13: Search as Communication: Lessons from a Personal Journey

vs.  

Recognize  ambiguity  and  ask  for  clarifica/on.  

Page 14: Search as Communication: Lessons from a Personal Journey

Clarify,  then  refine.  

Computers   Books  

Page 15: Search as Communication: Lessons from a Personal Journey

Faceted  search.  It’s  not  just  for  e-­‐commerce.  

Page 16: Search as Communication: Lessons from a Personal Journey

Give  users  transparency,  guidance,  and  control.  

Page 17: Search as Communication: Lessons from a Personal Journey

Take-­‐away  for  search  engine  developers:      

Act  like  a  librarian.  Communicate  with  your  user.  

Page 18: Search as Communication: Lessons from a Personal Journey

2.  Adventures  with  Informa/on  Extrac/on  

Page 19: Search as Communication: Lessons from a Personal Journey

String  matching  is  great  but  has  limits.  

Page 20: Search as Communication: Lessons from a Personal Journey

20  20

for i in [1..n]! s ← w1 w2 … wi! if Pc(s) > 0! a ← new Segment()! a.segs ← {s}! a.prob ← Pc(s)! B[i] ← {a}! for j in [1..i-1]! for b in B[j]! s ← wj wj+1 … wi! if Pc(s) > 0! a ← new Segment()! a.segs ← b.segs U {s}! a.prob ← b.prob * Pc(s)! B[i] ← B[i] U {a}! sort B[i] by prob! truncate B[i] to size k!

People  search  for  en//es.  Recognize  them!  

Page 21: Search as Communication: Lessons from a Personal Journey

Named  en/ty  recogni/on  is  free,  as  in  free  beer.  

Page 22: Search as Communication: Lessons from a Personal Journey

Problem:  they  process  each  document  separately.  

EnAty  DetecAon  System  

Why  not  take  advantage  of  corpus  features?      

Page 23: Search as Communication: Lessons from a Personal Journey

Give  your  documents  the  right  to  vote!  

Use  a  high-­‐recall  method  to  collect  candidates.  •  e.g.,  all  Atle-­‐case  spans  of  words  other  

than  single  word  beginning  a  sentence.    Process  each  document  separately.  

•  Each  candidate  is  assigned  an  enAty  type,  or  no  type  at  all.  

 If  a  candidate  is  mostly  assigned  a  single  enAty  type,  extrapolate  to  all  its  occurrences.  

Page 24: Search as Communication: Lessons from a Personal Journey

Looking  for  topics?  Use  idf,  and  its  cousin  ridf.  

Inverse  document  frequency  (idf)  •  Too  low?  Probably  a  stop  word.  •  Too  high?  Could  be  noise.    Residual  inverse  document  frequency  (ridf)  •  Predict  idf  using  Poisson  model.  •  Difference  between  idf  and  predicted  idf.  

 “a  good  keyword  is  far  from  Poisson”            [Church  and  Gale,  1995]  

Page 25: Search as Communication: Lessons from a Personal Journey

Terminology  extrac/on?  Try  data  recycling.  

Page 26: Search as Communication: Lessons from a Personal Journey

Obtain  en//es  by  any  means  necessary.  

Page 27: Search as Communication: Lessons from a Personal Journey

Take-­‐away  for  search  engine  developers:      

En/ty  detec/on  is  crucial.  And  it  isn’t  that  hard.  

Page 28: Search as Communication: Lessons from a Personal Journey

3.  A  Moment  of  Clarity  

Page 29: Search as Communication: Lessons from a Personal Journey

informaAon  Need   query   select  from  results  

rank  using  IR  model  

USER:  

SYSTEM:  M-­‐idf   PageRank  

Let’s  go  back  to  our  pigeons  for  a  moment.    

Page 30: Search as Communication: Lessons from a Personal Journey

What  does  this  process  look  like  to  the  system?  

vs.  

Page 31: Search as Communication: Lessons from a Personal Journey

And  here’s  what  it  looks  like  to  the  user.  

GOOD   NOT  SO  GOOD  

But  can  the  system  tell  the  difference?  

Page 32: Search as Communication: Lessons from a Personal Journey

User  experience  should  reflect  system  confidence.  

vs.  

Page 33: Search as Communication: Lessons from a Personal Journey

h^p://searchengineland.com/ge`ng-­‐organized-­‐paid-­‐search-­‐user-­‐intent-­‐the-­‐search-­‐funnel-­‐116312  Derived  from  [Jansen  et  al,  2007].  

Searches  reflect  a  variety  of  informa/on  needs.  

Page 34: Search as Communication: Lessons from a Personal Journey

34  34

for i in [1..n]! s ← w1 w2 … wi! if Pc(s) > 0! a ← new Segment()! a.segs ← {s}! a.prob ← Pc(s)! B[i] ← {a}! for j in [1..i-1]! for b in B[j]! s ← wj wj+1 … wi! if Pc(s) > 0! a ← new Segment()! a.segs ← b.segs U {s}! a.prob ← b.prob * Pc(s)! B[i] ← B[i] U {a}! sort B[i] by prob! truncate B[i] to size k!

We  can  segment  informa/on  need  from  the  query.  

Page 35: Search as Communication: Lessons from a Personal Journey

We  can  learn  from  analyzing  user  behavior.  

Page 36: Search as Communication: Lessons from a Personal Journey

And  we  can  look  at  our  relevance  scores.  

Naviga/onal   Exploratory  

Page 37: Search as Communication: Lessons from a Personal Journey

Claudia  Hauff,  Query  Difficulty  for  Digital  Libraries  [2009]  

There  are  many  pre-­‐  and  post-­‐retrieval  signals.  

Page 38: Search as Communication: Lessons from a Personal Journey

Take-­‐away  for  search  engine  developers:      

Queries  vary  in  difficulty.  Recognize  and  adapt.  

Page 39: Search as Communication: Lessons from a Personal Journey

Review  

1.  Lessons  from  Library  Science  •  Act  like  a  librarian.  Communicate  with  users.  

 2.  Adventures  with  InformaAon  ExtracAon  

•  EnAty  detecAon  is  crucial.  And  isn’t  that  hard.    3.  A  Moment  of  Clarity  

•  Queries  vary  in  difficulty.  Recognize  and  adapt.  

Page 40: Search as Communication: Lessons from a Personal Journey

Conclusion:  Read  the  textbooks.  

But  treat  search  as  a  communica/on  problem.  

Page 41: Search as Communication: Lessons from a Personal Journey

WE’RE  HIRING!  hbp://data.linkedin.com/search  

   

Contact  me:  [email protected]  

hbp://linkedin.com/in/dtunkelang