buildingsocialanalyticstoolwithmongodb

22
Building Social Analy/cs Tool with MongoDB A Developer's Perspec/ve

Upload: mongodb-apac

Post on 22-Jun-2015

50 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Buildingsocialanalyticstoolwithmongodb

             Building  Social  Analy/cs  Tool  with  MongoDB  -­‐  A  Developer's  Perspec/ve

Page 2: Buildingsocialanalyticstoolwithmongodb

1.  Product  Overview  

2.  Why  MongoDB  for  us?  

3.  Aggrega?on  Queries  to  the  rescue  

4.  How  Javascript  helped  us?  

5.  Experiences  with  Indexes  

6.  In-­‐progress  use-­‐cases  

7.  Tips  &  Tricks  

8.  Demo  

Agenda

Page 3: Buildingsocialanalyticstoolwithmongodb

Abhishek  Tejpaul    SoUware  Developer  @  IntelliGrape  SoUware    Loves  Grails,  Git  and  Linux    [email protected]  

About me

Page 4: Buildingsocialanalyticstoolwithmongodb

DataSiU  

Instagram  

Web  Crawler1  

Web  Crawler…  

mongoDB

Product Overview – Information Flow

Page 5: Buildingsocialanalyticstoolwithmongodb

Product Overview – Results

Page 6: Buildingsocialanalyticstoolwithmongodb

Product Overview – Results

Page 7: Buildingsocialanalyticstoolwithmongodb

Product Overview – Results

Page 8: Buildingsocialanalyticstoolwithmongodb

•  Schema-­‐less  data.  Typical  data  sources    

•  Adding  new  social  pla4orms  in  future  

•  Needed  fast  read-­‐write  opera6ons  

Why MongoDB for us?

Page 9: Buildingsocialanalyticstoolwithmongodb

Aggregation Queries – Getting Insights •  Combina6on  of  queries  chained  together  

•  At  every  stage,  we  can  filter/chain/massage  data    

Image  credit:  h@ps://www.openshiC.com/blogs/an-­‐overview-­‐of-­‐whats-­‐new-­‐in-­‐mongodb-­‐22  

Page 10: Buildingsocialanalyticstoolwithmongodb

Our use-case (esp. for graphs)

•  Sen6ment  Analysis  

•  Demographic  Analysis  

•  Ar6cle  Analysis  

•  Plan  •  Crea?on  of  Intelligence  tables  in  advance  

•  Reality  •  On-­‐the-­‐fly  analysis  using  Aggrega6on  queries  

Page 11: Buildingsocialanalyticstoolwithmongodb

How to go about it? •  Operates  on  a  single  collec6on    

•  Think  about  data  you  have  and  insights  you  want  

•  Focus  on  reducing  data  size  early  on  •  $match  •  $project  •  $sort  •  $limit,  $skip  

•  Example db.collec?onName.aggregate(  

 {  "$match"  :  {  fieldName  :    matchingValue    },    {  "$project"  :  {    oldOrNewField:  fieldValue      }},    {  "$group"  :  {  fieldName  :  oldOrNewField,  "sum":  {"$sum":1}}},    {  "$sort"  :    {  "sum"  :  -­‐1  }},    {  "$limit"  :  20  })  

 

Page 12: Buildingsocialanalyticstoolwithmongodb

Javascript Capabilities

•  All  the  programming  capabili6es  of  Javascript  language  at  your  

disposal  

•  Taking  business  logic  /  processing  to  your  data-­‐store  

Page 13: Buildingsocialanalyticstoolwithmongodb

Javascript – Our use-cases

•  Remove  garbage  data  at  DB  level  

•  Twijer  wrong  results  •  Filtering  out  STOP  keywords  

   db.IgnoreList.findOne().stopWords.forEach(  func?on(data)  {      db.ProcessedAr?cle.update(        {  "isAc?ve"  :  true,  "isIgnored"  :  {"\$ne":true}  },          {            "\$pull"    :  {"topicOfDiscussion"  :  {"name":  data}},          "\$set"    :  {"isIgnored"  :    true}        },        {  "mul?"    :  true    }      )    });    return  true  

 

Page 14: Buildingsocialanalyticstoolwithmongodb

Javascript – Caveats

•  Takes  up  read-­‐write  locks  on  the  en6re  database  •  Can  be  run  with  {‘noLock’  :  true}  op?on  

   db.runCommand({  

     Eval:  <func?on>,                                                        Args:  <args>,  

     Nolock:  <true/false>        })  

 •  Can  be  replaced  by  mapreduce  in  most  cases    •  Take  it  as  one-­‐off  case  

Page 15: Buildingsocialanalyticstoolwithmongodb

Indexes – Our use-cases

•  dropDups  {dropDups  :  true}  

•  backGround  {backGround  :  true}  

•  Time  to  Live  

{expireAUerSeconds  :  3600}  

•  Compound  Indexing  

{key1  :  1,  key2  :  1}  !=  {key1  :  1}    

Page 16: Buildingsocialanalyticstoolwithmongodb

Our current state

•  Faster  write  opera?ons  •  Under  high  data  load  from  different  sources  

•  Faster  read  opera?ons  •  Graph  rendering  up-­‐to  10  x  quicker  

•  Ease  of  scalability  •  Though  yet  to  reach  there  

Page 17: Buildingsocialanalyticstoolwithmongodb

Work In Progress

•  Full-­‐text  search  implementa?on  

•  can  be  created  only  on  strings  or  array  of  strings  

•  db.collec?onName.ensureIndex(  {  fieldName  :  "text"  }  )  

•  Capped  Collec?ons  •  Widgets  for  last-­‐run  jobs  /  event  log  tables  

•  Very  fast  writes  possible  

•  db.createCollec?on("cName",  {  capped  :  true,  size  :  5242880,  

max  :  5000  }  )  

•  size  argument  is  always  required  

Page 18: Buildingsocialanalyticstoolwithmongodb

Tips / Tricks – Things we learnt

•  cloneCollec6on  •  No  more  ssh/scp  to  remote  systems  •  db.runCommand({cloneCollec?on:  <nsCollec?on>,  from:  <remote>,  query:  {}})  

•  db.cloneCollec?on(from,  collec?onName,  query)  

•  db.Collec-onName.copyTo  

•  doesn’t  not  copy  indexes  

Page 19: Buildingsocialanalyticstoolwithmongodb

Tips / Tricks – Things we learnt

•  remove()  vs  drop()  

•  Can’t  use  remove  for  capped  collec6ons    

•  remove  keeps  indexes  while  drop()  clears  them  

•  To  remove  all  the  documents  in  a  collec?on,  use  drop()  

•  To  remove  beZer  part  of  large  collec?on,  use  javascript  

•  preZy()  find  by  default  •  DBQuery.prototype._prejyShell  =  true  (  inside  your  ~/.mongorc.js)  

Page 20: Buildingsocialanalyticstoolwithmongodb

DEMO  

Page 21: Buildingsocialanalyticstoolwithmongodb

I  am  not  a  MongoDB  expert  though  J  

Page 22: Buildingsocialanalyticstoolwithmongodb

Thank  You!!