mongodb at baidu

22
MongoDB@Baidu Xiao Beibei Project Owner & Senior Developer

Upload: mat-keep

Post on 07-Apr-2017

1.313 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: MongoDB at Baidu

MongoDB@Baidu  

Xiao Beibei Project Owner & Senior Developer

Page 2: MongoDB at Baidu

Baidu  

Page 3: MongoDB at Baidu

Who  are  we?

ü  Largest  internet  search  services  in  China  

ü  Various  products,  solu=ons  &  services  

ü  NASDAQ:  BIDU   Market Cap: 64B Revenue: 10B Qtrly Growth: 33.10%

Page 4: MongoDB at Baidu

Story  between  2  “Giants”

+Who  am  

I? ü  Senior  NoSQL  Developer  

ü  Various  MongoDB  project  owner  

ü  In  charge  of    the  LARGEST  MongoDB  cluster  in  CHINA  

Page 5: MongoDB at Baidu

Where  MongoDB  fits?  

Page 6: MongoDB at Baidu

Small  Step  à  Big  Surprise

l  Start  from  Baidu  Address  Book  

ü  Small  project  

ü  Various  sources  

ü  Flexible  schema  

l  more  than  3  hundred  million  

users

Page 7: MongoDB at Baidu

Success  +  Confidence  =  More  Projects •  Message  &  Mul=media  Message  Projects •  Netdisk  picture  meta  data  •  Facial  Recogni=on  System  •  User  Opera=on  Log  System  •  Baidu  Cloud  •  Baidu  Post  Bar  …  …

ü  Over  100  businesses  ü  Drive  meta  data  >  200B  ü  PB  Level  

Page 8: MongoDB at Baidu

Big  MongoDB  Cluster •  Consolidate  the  entrance  •  All  use  SSD  +  raid  0  •  Most  1  Master,  2  Secondary,  2  Arbiter  •  Some  1  Master,  2  Secondary,  1  Arbiter  

Standard  Mongodb  Cluster

Standard  Mongodb  Cluster ….

Rest  mongoDB  service  Api

… mongos

P

S…

A…

P S…

A…

config

Page 9: MongoDB at Baidu

How  we  use  MongoDB?  

Page 10: MongoDB at Baidu

Throughput  !!!

•  All  run  good,  BUT  when  WRITES  >  10  thousands  qps

Query  Slow  

Writes  Timeout Mongod  

Memory  Usage  Increase

Reads  impact,  Query  Slow

Problem

Page 11: MongoDB at Baidu

Simple  way  is  the  BEST! Root  Cause  Cache  Replacement

In  3.0,  Cache  replacement  works  not  quite  efficiently

Try  to  Pilot  Upgrade  to  3.2  

Solu=on

Page 12: MongoDB at Baidu

Replica=on  makes  this  possible Problem

Online  index  crea=on  issue  •  Time-­‐Consuming  •  Direct  or  background  •  Write  =meout  during  crea=ng

Solu=on

•  Crea=ng  index  in  turn  •  Secondary  first  and  primary  last  •  Oplog  =me  

Page 13: MongoDB at Baidu

Big  Issue Problem

Why?  •  MongoDB  balancer  user  single  thread  to  move  data  •  Cons  &  Pros

Query  Slow!!!

Data  increases  rapidly  à  Clusters  increase  accordingly  Largest  cluster  =  160  shards,  2T  each

Page 14: MongoDB at Baidu

Mi=ga=on •  Reduced  the  balancer  window  from  24  to  6  hours,  so  that  it  ran  in  off-­‐

peak  hours  •  Good  way  for  a  period  =me,  BUT  when  more  …

•  Shard  key:  uid  or  Hash?  •  Pre-­‐alloca=ng  chunks  •  Balancer  or  oplog?

Solu=on

Page 15: MongoDB at Baidu

Na=ve  Auto  Balance    

Config  Server Mongos

shard1 shard2

Please  receive  data

Data  Transferring  …

Update  Chunk  Manager Update  Chunk  Manager

Update  Chunk  Informa=on

Update  Chunk  Cache

Delete  or  Not  delete

Incremental  data  sync

Move  certain  chunk  to  shard2  

Solu=on

Page 16: MongoDB at Baidu

Modified  Balancer

Data  Transferring  …  

Update  Chunk  Manager Update  Chunk  Manager

Update  Chunk  Informa=on

Update  when  WriteBack

Solu=on

Config  Server Mongos

shard1 shard2

Page 17: MongoDB at Baidu

Itera=on  in  Detail

IdenFfy  a  range  to  be  migrated Identify

Take  a  note  of  the  current  oplog  Fme  Record

Send  a  query  to  source  shard,  and  iterate  over  the  returned   cursor   to   write  matching   documents   to  the  desFnaFon  shard  

Query

Scan   the  oplog   from   the   source   shard   for  events  recorded  from  Fmestamp  recorded  at  the  start  of  this  pass;  matching  events  are  then  wriLen  to  the  desFnaFon  shard  

Scan & Match

When   the   last   oplog   event   has   been   applied,   the  pass  has  completed  and  the  worker  process  can  be  stopped  

Apply

Page 18: MongoDB at Baidu

Summary  

Page 19: MongoDB at Baidu

Quick  Summary

•  Early  adop=on  makes  us  

•  100+  diverse  app  &  more  are  coming  

•  $$$  Cost  saving  with  awesome  scalability  

•  Con=nuous  improvements  =  Confidence  

•  Add  LSM  to  WT  to  have  beier  insert  performance  •  Mulitmaster  as  an  op=on  

Page 20: MongoDB at Baidu

Key  Take  away •  Baidu  =  Big  system  +  Big  data  +  Big  challenge  

–  We  need  a  strong  &  scalable  DB  architecture,  MongoDB  is  fantas=c!  

•  Upgrading  to  3.x  is  a  MUST  –  WT  engine,  Document  valida=on,  …    

•  Innova=on  &  Automa=on  via  customized  scripts  

         MongoDB  CAN  manage  our  “BIG  DATA”  

600  nodes  160  shards  

200  B  documents  

Page 21: MongoDB at Baidu

Next  Steps MongoDB:    is  enhancing  balancer  performance    

Working  with  MongoDB  as  the  beta  tester  for  the  new  feature  

Enabling  parallel  chunk  migra=on   Remove  Throiling  by  Default  (for  WiredTiger)  

Page 22: MongoDB at Baidu

+Questions?