session #2, tech session: build realtime search by sylvain utard from algolia

35
Title Build Realtime Search From mobile SDK to SaaS, a tech POV Sylvain Utard SaaSisBeautiful #2 – June 2014

Upload: saas-is-beautiful

Post on 07-Jul-2015

180 views

Category:

Technology


0 download

DESCRIPTION

Sylvain Utard, VP of engineering at Algolia presents how they're building a realtime search engine

TRANSCRIPT

Page 1: Session #2, tech session: Build realtime search by Sylvain Utard from Algolia

Title

Build  Realtime  Search From  mobile  SDK  to  SaaS,  a  tech  POV  

Sylvain  Utard

SaaSisBeautiful    #2  –  June  2014

Page 2: Session #2, tech session: Build realtime search by Sylvain Utard from Algolia

• Today  Search  means  Google  

• Search  is  a  daily  activity  

• Search  is  complex  

• DB  are  not  handling  text  queries    

• Speed  and  relevance  are  keys  

• Fuzzy  matching  (typo-­‐tolerance) 2

Search

Page 3: Session #2, tech session: Build realtime search by Sylvain Utard from Algolia

• Databases  

• Optimized  for  INSERT/UPDATE/DELETE/

SELECT  (that's  a  lot)  

• Structured  query  syntax  (mostly  SQL)  

• Some  operations  scan  all  your  rows3

Why Search Engines?

Page 4: Session #2, tech session: Build realtime search by Sylvain Utard from Algolia

• Search  engines  

• HIGHLY  optimized  for  “SELECT”  (only)  

• Full-­‐text  queries:  understand  what  is  a  word  

• Query  execution  time  driven  by  the  number  of  

matching  documents  

• And  obviously,  “LIKE  '%foo%’"  is  not  full-­‐text  search4

Why Search Engines?

Page 5: Session #2, tech session: Build realtime search by Sylvain Utard from Algolia

• Indexing(input=documents)  

• Multiple  attributes  (textual,  numerical,  geo)  

• Search(input=query,  output=documents)  

• Full-­‐text  queries  and/or  numerical  filters  

• Understandable  results:  score  (ranking)  +  

highlighting 5

How it works?

Page 6: Session #2, tech session: Build realtime search by Sylvain Utard from Algolia

• 2  distinct  processes  

• Indexing  

• Storing  documents  in  a  highly  optimized  way  

• Query  

• Matching  documents  

• Ranking  matched  documents 6

Implementation

Page 7: Session #2, tech session: Build realtime search by Sylvain Utard from Algolia

• Indexing  means  building  an  “index“  or  “inverted  

lists“  

• A  dedicated  data  structure  optimized  for  search  

(only)  

• Input  =  a  set  of  documents  containing  words  

• Output  =  a  set  of  words  associated  to  documents7

Implementation: Indexing process

Page 8: Session #2, tech session: Build realtime search by Sylvain Utard from Algolia

8

Implementation: Indexing process

foo bar baz

Doc 1

bar foo

Doc 2

baz baz qux

Doc 3

foo

bar

baz

qux

Doc 1, Doc 2

Doc 1, Doc 3

Doc 1, Doc 2

Doc 3Indexing

Inverted lists

Documents Index

Page 9: Session #2, tech session: Build realtime search by Sylvain Utard from Algolia

• Queries  

• Goal  =  Retrieve  all  documents  matching  a  

user  query  

• Order  results  from  the  highest  ranked  to  the  

lowest9

Implementation: Query Process

Page 10: Session #2, tech session: Build realtime search by Sylvain Utard from Algolia

10

Implementation: Query Process

foo

bar

baz

qux

Doc 1, Doc 2

Doc 1, Doc 3

Doc 1, Doc 2

Doc 3

Inverted lists

Index

User query "baz"

Sort matching documents

Pagination

Page 11: Session #2, tech session: Build realtime search by Sylvain Utard from Algolia

11

Implementation: Query Process

foo

bar

baz

qux

Doc 1, Doc 2

Doc 1, Doc 3

Doc 1, Doc 2

Doc 3

Inverted lists

Index

User query "baz qux"

Sort matching documents

Intersect inverted lists

Pagination

Page 12: Session #2, tech session: Build realtime search by Sylvain Utard from Algolia

12

Database Search

Documents* Database*entries*

Page 13: Session #2, tech session: Build realtime search by Sylvain Utard from Algolia

• Funded  in  2012  

• 2012  →  Mar  2013  

• Mobile-­‐oriented  

• Now:  SaaS-­‐oriented  

• Search  engine  as  a  Service13

Title

Page 14: Session #2, tech session: Build realtime search by Sylvain Utard from Algolia

• Embed  a  Search  Engine  in  your  App  • iOS,  Android,  Windows  Phone  

• SDK/library  provider  • Offline  • Ideal  customers  • Evernote,  Contacts,  POI,  …

14

Mobile first

Page 15: Session #2, tech session: Build realtime search by Sylvain Utard from Algolia

• Search  as  you  type  

• Typo-­‐tolerance  

• High-­‐performance  

• Target  most  phones  

• Starting  from  the  cheapest  Android  phone15

Mobile focus

Page 16: Session #2, tech session: Build realtime search by Sylvain Utard from Algolia

• 10-­‐20  queries  /  sec  

• Realtime  if  <100ms  

• 1  sec  to  build    a  10K  entries  index  

• C++  engine  +  Objective-­‐C/C#/Java  interfaces  

• <100KB  of  RAM,  whatever  the  index  size16

Mobile Performance

Page 17: Session #2, tech session: Build realtime search by Sylvain Utard from Algolia

• Same  issues  on  websites  &  apps  

• Used  to  Google/Amazon:  it  just  works  

• Poor  search  experience  everywhere  

• SQL/NoSQL  technologies  are  not  providing  

any  working  solution17

What about hosted search?

Page 18: Session #2, tech session: Build realtime search by Sylvain Utard from Algolia

18

Hosted Search

1. Push  a  copy  of  your  data  

2. Get  blazing  fast  search

Page 19: Session #2, tech session: Build realtime search by Sylvain Utard from Algolia

• Open-­‐source  

• ElasticSearch,  Solr,  Sphinx  

• Commercial  

• Hosted  ElasticSearch/Solr/Sphinx  

• Enterprise-­‐oriented  on-­‐premise  engines19

Alternatives

Page 20: Session #2, tech session: Build realtime search by Sylvain Utard from Algolia

• Mostly  document  oriented  

• Designed  to  search  in  “big”  documents  

• Statistical  ranking  algorithm  

• No  instant-­‐search  capabilities

20

Alternatives

Page 21: Session #2, tech session: Build realtime search by Sylvain Utard from Algolia

• Database  Search  

• Semi-­‐structured  objects  (multiple  

attributes)  

• Give  importance  to  the  right  attributes  

• Combine  text  relevance  &  record  popularity21

Database Search

Page 22: Session #2, tech session: Build realtime search by Sylvain Utard from Algolia

• No  stats,  no  TF-­‐IDF,  no  “score”  

• Tie-­‐breaking  based,  one  criterion  after  another  

1.  #  typos  2.  geo  3.  proximity  4.  attribute  weight  5.  exact  match  6.  custom 22

Record rank

Page 23: Session #2, tech session: Build realtime search by Sylvain Utard from Algolia

• C++  mobile  SDK  →  C++  backend  search  engine  

• hosted  as  a  NGINX  module  

• multi-­‐tenant  (mutualized  resources)  

• fault-­‐tolerant  (SLA  99.99%)  

• Faceting,  synonyms,  analytics,  …23

Repackaging + Improvements

Page 24: Session #2, tech session: Build realtime search by Sylvain Utard from Algolia

• Each  cluster  =  3  machines  

• Distributed  consensus  (SLA)  

• Multiple  datacenters  (EU,  US,  ASIA)  

• Bare-­‐metal  servers  

• 6c  (12t)  3.5Ghz  

• 128GB  RAM  

• 2x480GB  SSD  (RAID-­‐0) 24

SaaS Architecture

Page 25: Session #2, tech session: Build realtime search by Sylvain Utard from Algolia

25

SaaS Architecture v1

Page 26: Session #2, tech session: Build realtime search by Sylvain Utard from Algolia

• More  and  more  users  

• API  slaughter  

• Too  many  I/O  

• Writes  /  sec  

• Consensus26

SaaS Architecture v2

Page 27: Session #2, tech session: Build realtime search by Sylvain Utard from Algolia

27

SaaS Architecture v2

Page 28: Session #2, tech session: Build realtime search by Sylvain Utard from Algolia

• Data  privacy  

• Send  us  only  non-­‐critical  data  

• Dedicated  cluster  

• Per  end-­‐user  security  

• Restrict  the  result  set  per  end-­‐user,  per  tag,  …  

• Crawling  

• Built-­‐in  rate-­‐limits 28

SaaS Security

Page 29: Session #2, tech session: Build realtime search by Sylvain Utard from Algolia

• 2B  operations  in  June  

• 30%  month-­‐over-­‐month  growth  in  MRR  

• 40+  servers

29

What about scalability?

Page 30: Session #2, tech session: Build realtime search by Sylvain Utard from Algolia

30

Monitoring

• ServerDensity  

• Custom  probes  

• Alerts  

• SMS  

• Email

Page 31: Session #2, tech session: Build realtime search by Sylvain Utard from Algolia

• RAM  over-­‐booking  

• Small  memory  footprint  per  index  

• All  indexes  are  mmaped  

• Lazy-­‐loading  (no  query  =  no  RAM  consumption)  

• SSD  

• Disable  swapping  

• Setup  a  new  cluster  if  the  current  one  is  full 31

RAM

Page 32: Session #2, tech session: Build realtime search by Sylvain Utard from Algolia

• Do  NOT  trust  your  default  system  configuration  

• I/O:  not  optimized  for  SSD  

• Memory:  not  optimized  for  128GB  RAM  

• Network:  not  optimized  for  +10K  keep-­‐alive  

connections32

Network

Page 33: Session #2, tech session: Build realtime search by Sylvain Utard from Algolia

• Automatic  

• Ability  to  rollback  

• Ability  to  test  on  a  “fake”  production  env

33

Deployment

Page 34: Session #2, tech session: Build realtime search by Sylvain Utard from Algolia

• Your  server  will:  

• reboot  

• crash  

• explode  

• Make  it  happen  now!34

Hardware

Page 35: Session #2, tech session: Build realtime search by Sylvain Utard from Algolia

35

Questions?