Download - Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn
![Page 1: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn](https://reader030.vdocuments.net/reader030/viewer/2022020301/587d81f01a28ab634b8b7fc1/html5/thumbnails/1.jpg)
Fast, Lenient, and AccurateBuilding Personalized Instant Search Experience at LinkedIn
Ganesh Venkataraman, Abhi Lad, Lin Guo, Shakti SinhaLinkedIn
![Page 2: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn](https://reader030.vdocuments.net/reader030/viewer/2022020301/587d81f01a28ab634b8b7fc1/html5/thumbnails/2.jpg)
Agenda
● LinkedIn● LinkedIn Search
○ Navigational vs Exploratory searches○ Typeahead vs SERP
● Big picture and problem statement● Instant search – Search-as-you-type
○ Query autocomplete○ Entity-aware suggestions○ Instant results
● Conclusions & Future work
![Page 3: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn](https://reader030.vdocuments.net/reader030/viewer/2022020301/587d81f01a28ab634b8b7fc1/html5/thumbnails/3.jpg)
LinkedIn – Professional Identity
![Page 4: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn](https://reader030.vdocuments.net/reader030/viewer/2022020301/587d81f01a28ab634b8b7fc1/html5/thumbnails/4.jpg)
LinkedIn – Professional Graph
![Page 5: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn](https://reader030.vdocuments.net/reader030/viewer/2022020301/587d81f01a28ab634b8b7fc1/html5/thumbnails/5.jpg)
LinkedIn – Jobs
![Page 6: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn](https://reader030.vdocuments.net/reader030/viewer/2022020301/587d81f01a28ab634b8b7fc1/html5/thumbnails/6.jpg)
LinkedIn – And much more...
Companies
Skills
Professional Content
![Page 7: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn](https://reader030.vdocuments.net/reader030/viewer/2022020301/587d81f01a28ab634b8b7fc1/html5/thumbnails/7.jpg)
LinkedIn – Massive Scale
![Page 8: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn](https://reader030.vdocuments.net/reader030/viewer/2022020301/587d81f01a28ab634b8b7fc1/html5/thumbnails/8.jpg)
LinkedIn Search
![Page 9: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn](https://reader030.vdocuments.net/reader030/viewer/2022020301/587d81f01a28ab634b8b7fc1/html5/thumbnails/9.jpg)
Navigational Search
Looking for someone specific by name.
Query has a single correct result.
![Page 10: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn](https://reader030.vdocuments.net/reader030/viewer/2022020301/587d81f01a28ab634b8b7fc1/html5/thumbnails/10.jpg)
Exploratory SearchFinding people that match a given set of criteria.
Multiple results match the user’s query.
![Page 11: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn](https://reader030.vdocuments.net/reader030/viewer/2022020301/587d81f01a28ab634b8b7fc1/html5/thumbnails/11.jpg)
Instant Search – Search-as-you-typeSatisfy navigational searches: Show instant search results.
Help frame exploratory searches: Complete the user’s query and show search suggestions.
![Page 12: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn](https://reader030.vdocuments.net/reader030/viewer/2022020301/587d81f01a28ab634b8b7fc1/html5/thumbnails/12.jpg)
Big PicturePartial query
Instant results Autocomplete
Search suggestions
Query tagger
Full-text search
Search results
Manuallyenteredquery
![Page 13: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn](https://reader030.vdocuments.net/reader030/viewer/2022020301/587d81f01a28ab634b8b7fc1/html5/thumbnails/13.jpg)
Big PicturePartial query
Instant results Autocomplete
Search suggestions
Query tagger
Full-text search
Search results
Manuallyenteredquery
Focus today:● Autocomplete● Search suggestions● Instant results
![Page 14: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn](https://reader030.vdocuments.net/reader030/viewer/2022020301/587d81f01a28ab634b8b7fc1/html5/thumbnails/14.jpg)
Problem StatementPartial query
Instant results Autocomplete
Search suggestions
Query tagger
Full-text search
Search results
Manuallyenteredquery
Focus today:● Autocomplete● Search suggestions● Instant results
How can we build an instant search experience that scales to 450+ million members, and is fast, lenient, and accurate?
● Instant search = Query autocomplete + search suggestions + instant results● Fast = Search-as-you-type latencies● Lenient = Handle spelling errors and common variations● Accurate = Highly relevant and personalized results
![Page 15: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn](https://reader030.vdocuments.net/reader030/viewer/2022020301/587d81f01a28ab634b8b7fc1/html5/thumbnails/15.jpg)
Query Tagging
PERSON
TITLE(ID=126)
COMPANY(ID=1337)
Entity types identified: Person name, job title, company, school, skills, locations.
Key part of query processing!Impacts: autocomplete, spelling correction, search suggestions,query rewriting, ranking.
Sequential prediction model(CRF – Conditional Random Fields)
Training data:● Standardized dictionaries (people names,
companies, schools, titles, skills, locations)● Query logs● Clickthrough (CTR) data● Crowdsourced labels
![Page 16: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn](https://reader030.vdocuments.net/reader030/viewer/2022020301/587d81f01a28ab634b8b7fc1/html5/thumbnails/16.jpg)
Query Autocomplete
● Fast● Relevant and contextual● Resilient to spelling errors
![Page 17: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn](https://reader030.vdocuments.net/reader030/viewer/2022020301/587d81f01a28ab634b8b7fc1/html5/thumbnails/17.jpg)
Query Autocomplete – Offline processing
linkedin software engineersoftware engineerbig datadata scientistdata engineerexpert systems..
[linkedin] [software engineer]
Query logs Entities Index
FST – Finite State Transducers
Compact + fast retrieval + fuzzy match (via Levenstein Automata)
![Page 18: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn](https://reader030.vdocuments.net/reader030/viewer/2022020301/587d81f01a28ab634b8b7fc1/html5/thumbnails/18.jpg)
Query Autocomplete – Online processingTwo step process:
1. Retrieval (Candidate generation)
User’s query: [big data e]
Candidates = C(big data e) U C(data e) U C(e)= big data engineer, big data expert systems, big data entry, ...
linkedin software engineersoftware engineerbig datadata scientistdata engineerexpert systems..
Query logs
![Page 19: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn](https://reader030.vdocuments.net/reader030/viewer/2022020301/587d81f01a28ab634b8b7fc1/html5/thumbnails/19.jpg)
Query Autocomplete – Online processingTwo step process:
2. Scoring (Ranking)
User’s query: [big data e]Candidate completions: “big data engineer”, “big data expert”, “big data entry”
Score(“big data engineer”):
P(s1, s2, s3…) ≈ P(s1)·P(s2|s1)·P(s3|s2).. // Bigram language model
Use entities : P([engineer] | [big data])Fall back to words : P(engineer | data)·P(data | big)
![Page 20: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn](https://reader030.vdocuments.net/reader030/viewer/2022020301/587d81f01a28ab634b8b7fc1/html5/thumbnails/20.jpg)
Query Suggestions – Autocomplete + query tagger
“linke” ⇒ “Linkedin” ⇒ COMPANY
“had” ⇒ “Hadoop” ⇒ SKILL
![Page 21: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn](https://reader030.vdocuments.net/reader030/viewer/2022020301/587d81f01a28ab634b8b7fc1/html5/thumbnails/21.jpg)
Instant Results
● Fast retrieval over 450+ million members● Highly personalized● Balance personalization & popularity● Resilient to spelling variations
![Page 22: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn](https://reader030.vdocuments.net/reader030/viewer/2022020301/587d81f01a28ab634b8b7fc1/html5/thumbnails/22.jpg)
Instant Results – Indexing
NAME: richardPREFIX: r, ri, ric, rich, richa, ...NAME: bransonPREFIX: b, br, bra, bran, brans, ...
● Inverted Index (Maps token to list of docs that contain that token):NAME:richard => [1, 4, 10, 15, …] // Everyone named “richard”PREFIX:ri => [1, 2, 4, 7, 10, 15, …] // Everyone whose name starts with “ri”…
● Retrieval approachUser’s query – richard bRewritten query – +NAME:richard +PREFIX:b
● Prefix-based tokenization:
DOCID 4
(posting lists)
![Page 23: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn](https://reader030.vdocuments.net/reader030/viewer/2022020301/587d81f01a28ab634b8b7fc1/html5/thumbnails/23.jpg)
Instant Results – Indexing
CONN: 1, 10, 15
● Inverted IndexCONN:4 => [1, 10, 15] // Everyone connected to Richard BransonCONN:1 => [4, ...]CONN:10 => [4, ...]...
● Retrieval approachUser’s query – richard bRewritten query – +NAME:richard +PREFIX:b +CONN:1
(Everyone named richard b… and connected to User:1)
● Connections Index:
DOCID 4
![Page 24: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn](https://reader030.vdocuments.net/reader030/viewer/2022020301/587d81f01a28ab634b8b7fc1/html5/thumbnails/24.jpg)
Instant Results – Indexing
Early Termination
Problem: A query like [PREFIX:ri] might retrieve too many candidate documents.
How can we retrieve the most promising documents first so that we don’t need to score all of them?
Static Rank: Order documents based on their prior (query independent) likelihood of relevance:
A combination of:● Profile views● Spam and security related scores● Editorial rules (Celebrities, influencers, …)
numToScore: The number of documents to retrieve and score for any query
![Page 25: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn](https://reader030.vdocuments.net/reader030/viewer/2022020301/587d81f01a28ab634b8b7fc1/html5/thumbnails/25.jpg)
Balancing Popularity and Personalization
Query: richard b…
Are you looking for Richard Branson, or a colleague name Richard Burton?
(Assume searcher’s ID = 1)
Rewritten Query:
● +NAME:richard +PREFIX:b +CONN:1 // Too restrictive. Only find searcher’s connections.
● +NAME:richard +PREFIX:b ?CONN:1[50%] // Try to retrieve 50% results from searcher’s connections
Instant Results – Retrieval
Custom search operator: “Weighted OR”
![Page 26: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn](https://reader030.vdocuments.net/reader030/viewer/2022020301/587d81f01a28ab634b8b7fc1/html5/thumbnails/26.jpg)
Instant Results – Spelling Variations
weiner ⇔ wiener
catherine ⇔ kathryn
dipak ⇔ deepak
![Page 27: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn](https://reader030.vdocuments.net/reader030/viewer/2022020301/587d81f01a28ab634b8b7fc1/html5/thumbnails/27.jpg)
Name Clusters
Offline process to cluster together similar sounding or similarly spelt names.
Two step process:
1. Coarse clustering (optimized for broad coverage)Normalization: repeated chars, accented chars, common phonetic variations (c ⇔ k, ph ⇔ f)Combination of edit distance & double metaphone (sound)E.g. (dipak = deepak), (wiener = weiner), (catherine = kathryn), (jeff = joff)
2. Fine-grained clustering (optimized for precision)Split up clusters based on more sophisticated rulesPosition and character-aware edit distanceQuery reformulation data (q1 → q2 → click)E.g. (jeff ≠ joff)
Instant Results – Spelling Variations
![Page 28: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn](https://reader030.vdocuments.net/reader030/viewer/2022020301/587d81f01a28ab634b8b7fc1/html5/thumbnails/28.jpg)
Instant Results – Spelling Variations
NAME: kathrynCLUSTER: katharine
Potential queries:katherinekathrynkatharinecatharine
Rewritten queries:?NAME:katherine ?CLUSTER:katharine?NAME:kathryn ?CLUSTER:katharine?NAME:katharine ?CLUSTER:katharine?NAME:catharine ?CLUSTER:katharine
Either match original query term or match the name cluster
Query time
Indexing time
![Page 29: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn](https://reader030.vdocuments.net/reader030/viewer/2022020301/587d81f01a28ab634b8b7fc1/html5/thumbnails/29.jpg)
Clicked result treated as positive.
All other shown results treated as negative.
Since this is navigational search, we assume there’s only 1 correct result => low presentation bias.
Learning to Rank (Machine-learned ranking)
Training data● Click data from previous typeahead sessions● <searcher, query, doc> ⇒ positive/negative
Features / signals● Textual match against various fields● Network distance, number of shared connections● Global popularity● Compound features
Instant Results – Scoring
+
–
–
–
![Page 30: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn](https://reader030.vdocuments.net/reader030/viewer/2022020301/587d81f01a28ab634b8b7fc1/html5/thumbnails/30.jpg)
Conclusions● Instant search experience
○ Directly satisfy navigational search uses in typeahead via Instant Results
○ Help the user frame exploratory search queries via Query Autocomplete & Search
Suggestions
● Combination of techniques○ Query tagger for entity extraction – “Things not Strings”○ FST-based query completion○ Inverted index-based instant results + Early termination + Weighted OR○ Name clusters for fuzzy name matching
![Page 31: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn](https://reader030.vdocuments.net/reader030/viewer/2022020301/587d81f01a28ab634b8b7fc1/html5/thumbnails/31.jpg)
Future Work● Personalized query completions
○ m ⇒ machine learning
○ m ⇒ machinist
● Multi-entity query suggestions○ Now : [linkedin] ⇒ “Find people who work at LinkedIn”
○ Future : [linkedin data scientist] ⇒ “Find data scientists at LinkedIn”
● Better blending○ Autocomplete + query suggestions + instant results○ Query features – what does the query mean?○ Results features – what results come back from each system?
![Page 32: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn](https://reader030.vdocuments.net/reader030/viewer/2022020301/587d81f01a28ab634b8b7fc1/html5/thumbnails/32.jpg)
Thank You!
![Page 33: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn](https://reader030.vdocuments.net/reader030/viewer/2022020301/587d81f01a28ab634b8b7fc1/html5/thumbnails/33.jpg)
LinkedIn – The Economic Graph
![Page 34: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn](https://reader030.vdocuments.net/reader030/viewer/2022020301/587d81f01a28ab634b8b7fc1/html5/thumbnails/34.jpg)
LinkedIn Search – SERP (Jobs)
![Page 35: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn](https://reader030.vdocuments.net/reader030/viewer/2022020301/587d81f01a28ab634b8b7fc1/html5/thumbnails/35.jpg)
LinkedIn Search – Typeahead
![Page 36: Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn](https://reader030.vdocuments.net/reader030/viewer/2022020301/587d81f01a28ab634b8b7fc1/html5/thumbnails/36.jpg)
LinkedIn Search – SERP