Transcript
Page 1: Search as Communication: Lessons from a Personal Journey

Search  as  Communica/on:  Lessons  from  a  Personal  Journey  

Daniel  Tunkelang  Head  of  Query  Understanding,  LinkedIn  

Page 2: Search as Communication: Lessons from a Personal Journey

These  are  great  textbooks  on  informa/on  retrieval.  

Page 3: Search as Communication: Lessons from a Personal Journey

Unfortunately,  I  never  read  them  in  school.  

But  I  did  study  graphs  and  stuff.    

Page 4: Search as Communication: Lessons from a Personal Journey

I  found  myself  developing  a  search  engine.  

Page 5: Search as Communication: Lessons from a Personal Journey

And  the  next  thing  I  knew,  I  was  a  search  guy.  

Page 6: Search as Communication: Lessons from a Personal Journey

So  what  did  I  learn  along  the  way?  

Page 7: Search as Communication: Lessons from a Personal Journey

Search  isn't  a  ranking  problem.  It's  a  communica/on  problem.  

Page 8: Search as Communication: Lessons from a Personal Journey

Outline  

1.  Lessons  from  Library  Science    2.  Adventures  with  InformaAon  ExtracAon    3.  A  Moment  of  Clarity  

Page 9: Search as Communication: Lessons from a Personal Journey

1.  Lessons  from  Library  Science  

Page 10: Search as Communication: Lessons from a Personal Journey

InformaAon  need   query   select  from  results  

rank  using  IR  model  

USER:  

SYSTEM:  M-­‐idf   PageRank  

A  birds-­‐eye  view  of  how  search  engines  work.  

Page 11: Search as Communication: Lessons from a Personal Journey

Old  school  search:  ask  a  librarian.  

Page 12: Search as Communication: Lessons from a Personal Journey

Search  lives  in  an  informa/on-­‐seeking  context.    

[Pirolli  and  Card,  2005]  

Page 13: Search as Communication: Lessons from a Personal Journey

vs.  

Recognize  ambiguity  and  ask  for  clarifica/on.  

Page 14: Search as Communication: Lessons from a Personal Journey

Clarify,  then  refine.  

Computers   Books  

Page 15: Search as Communication: Lessons from a Personal Journey

Faceted  search.  It’s  not  just  for  e-­‐commerce.  

Page 16: Search as Communication: Lessons from a Personal Journey

Give  users  transparency,  guidance,  and  control.  

Page 17: Search as Communication: Lessons from a Personal Journey

Take-­‐away  for  search  engine  developers:      

Act  like  a  librarian.  Communicate  with  your  user.  

Page 18: Search as Communication: Lessons from a Personal Journey

2.  Adventures  with  Informa/on  Extrac/on  

Page 19: Search as Communication: Lessons from a Personal Journey

String  matching  is  great  but  has  limits.  

Page 20: Search as Communication: Lessons from a Personal Journey

20  20

for i in [1..n]! s ← w1 w2 … wi! if Pc(s) > 0! a ← new Segment()! a.segs ← {s}! a.prob ← Pc(s)! B[i] ← {a}! for j in [1..i-1]! for b in B[j]! s ← wj wj+1 … wi! if Pc(s) > 0! a ← new Segment()! a.segs ← b.segs U {s}! a.prob ← b.prob * Pc(s)! B[i] ← B[i] U {a}! sort B[i] by prob! truncate B[i] to size k!

People  search  for  en//es.  Recognize  them!  

Page 21: Search as Communication: Lessons from a Personal Journey

Named  en/ty  recogni/on  is  free,  as  in  free  beer.  

Page 22: Search as Communication: Lessons from a Personal Journey

Problem:  they  process  each  document  separately.  

EnAty  DetecAon  System  

Why  not  take  advantage  of  corpus  features?      

Page 23: Search as Communication: Lessons from a Personal Journey

Give  your  documents  the  right  to  vote!  

Use  a  high-­‐recall  method  to  collect  candidates.  •  e.g.,  all  Atle-­‐case  spans  of  words  other  

than  single  word  beginning  a  sentence.    Process  each  document  separately.  

•  Each  candidate  is  assigned  an  enAty  type,  or  no  type  at  all.  

 If  a  candidate  is  mostly  assigned  a  single  enAty  type,  extrapolate  to  all  its  occurrences.  

Page 24: Search as Communication: Lessons from a Personal Journey

Looking  for  topics?  Use  idf,  and  its  cousin  ridf.  

Inverse  document  frequency  (idf)  •  Too  low?  Probably  a  stop  word.  •  Too  high?  Could  be  noise.    Residual  inverse  document  frequency  (ridf)  •  Predict  idf  using  Poisson  model.  •  Difference  between  idf  and  predicted  idf.  

 “a  good  keyword  is  far  from  Poisson”            [Church  and  Gale,  1995]  

Page 25: Search as Communication: Lessons from a Personal Journey

Terminology  extrac/on?  Try  data  recycling.  

Page 26: Search as Communication: Lessons from a Personal Journey

Obtain  en//es  by  any  means  necessary.  

Page 27: Search as Communication: Lessons from a Personal Journey

Take-­‐away  for  search  engine  developers:      

En/ty  detec/on  is  crucial.  And  it  isn’t  that  hard.  

Page 28: Search as Communication: Lessons from a Personal Journey

3.  A  Moment  of  Clarity  

Page 29: Search as Communication: Lessons from a Personal Journey

informaAon  Need   query   select  from  results  

rank  using  IR  model  

USER:  

SYSTEM:  M-­‐idf   PageRank  

Let’s  go  back  to  our  pigeons  for  a  moment.    

Page 30: Search as Communication: Lessons from a Personal Journey

What  does  this  process  look  like  to  the  system?  

vs.  

Page 31: Search as Communication: Lessons from a Personal Journey

And  here’s  what  it  looks  like  to  the  user.  

GOOD   NOT  SO  GOOD  

But  can  the  system  tell  the  difference?  

Page 32: Search as Communication: Lessons from a Personal Journey

User  experience  should  reflect  system  confidence.  

vs.  

Page 33: Search as Communication: Lessons from a Personal Journey

h^p://searchengineland.com/ge`ng-­‐organized-­‐paid-­‐search-­‐user-­‐intent-­‐the-­‐search-­‐funnel-­‐116312  Derived  from  [Jansen  et  al,  2007].  

Searches  reflect  a  variety  of  informa/on  needs.  

Page 34: Search as Communication: Lessons from a Personal Journey

34  34

for i in [1..n]! s ← w1 w2 … wi! if Pc(s) > 0! a ← new Segment()! a.segs ← {s}! a.prob ← Pc(s)! B[i] ← {a}! for j in [1..i-1]! for b in B[j]! s ← wj wj+1 … wi! if Pc(s) > 0! a ← new Segment()! a.segs ← b.segs U {s}! a.prob ← b.prob * Pc(s)! B[i] ← B[i] U {a}! sort B[i] by prob! truncate B[i] to size k!

We  can  segment  informa/on  need  from  the  query.  

Page 35: Search as Communication: Lessons from a Personal Journey

We  can  learn  from  analyzing  user  behavior.  

Page 36: Search as Communication: Lessons from a Personal Journey

And  we  can  look  at  our  relevance  scores.  

Naviga/onal   Exploratory  

Page 37: Search as Communication: Lessons from a Personal Journey

Claudia  Hauff,  Query  Difficulty  for  Digital  Libraries  [2009]  

There  are  many  pre-­‐  and  post-­‐retrieval  signals.  

Page 38: Search as Communication: Lessons from a Personal Journey

Take-­‐away  for  search  engine  developers:      

Queries  vary  in  difficulty.  Recognize  and  adapt.  

Page 39: Search as Communication: Lessons from a Personal Journey

Review  

1.  Lessons  from  Library  Science  •  Act  like  a  librarian.  Communicate  with  users.  

 2.  Adventures  with  InformaAon  ExtracAon  

•  EnAty  detecAon  is  crucial.  And  isn’t  that  hard.    3.  A  Moment  of  Clarity  

•  Queries  vary  in  difficulty.  Recognize  and  adapt.  

Page 40: Search as Communication: Lessons from a Personal Journey

Conclusion:  Read  the  textbooks.  

But  treat  search  as  a  communica/on  problem.  

Page 41: Search as Communication: Lessons from a Personal Journey

WE’RE  HIRING!  hbp://data.linkedin.com/search  

   

Contact  me:  [email protected]  

hbp://linkedin.com/in/dtunkelang  


Top Related