moving from c#/.net to hadoop/mongodb
DESCRIPTION
TRANSCRIPT
![Page 1: Moving from C#/.NET to Hadoop/MongoDB](https://reader036.vdocuments.net/reader036/viewer/2022062617/54c31af04a7959c1668b4569/html5/thumbnails/1.jpg)
Moving from C#/.NET to Hadoop/MongoDB
Robert Vandehey
December 4, 2012
![Page 2: Moving from C#/.NET to Hadoop/MongoDB](https://reader036.vdocuments.net/reader036/viewer/2022062617/54c31af04a7959c1668b4569/html5/thumbnails/2.jpg)
4 © 2012 Rovi Corporation. Company confidential.
We power the Discovery, Delivery and Display of Digital Entertainment
![Page 3: Moving from C#/.NET to Hadoop/MongoDB](https://reader036.vdocuments.net/reader036/viewer/2022062617/54c31af04a7959c1668b4569/html5/thumbnails/3.jpg)
7 © 2012 Rovi Corporation. Company confidential.
Viewers use our guide technologiesthrough service provider offerings
137M+
Consumer electronic (CE) deviceshave our CE guide technologies
266M+
Households reached globally by Rovi Advertising Network
40M+
Devices certified for high quality DivX video playback
600M+
Storefronts with entertainment servicespowered by Rovi Entertainment Store
47M+
TV shows, movies, sports and celebrities
4.5M+
Album releases and 32M music tracks3.3M+
Movie titles500K+
Data coverage:
Global Reach
![Page 4: Moving from C#/.NET to Hadoop/MongoDB](https://reader036.vdocuments.net/reader036/viewer/2022062617/54c31af04a7959c1668b4569/html5/thumbnails/4.jpg)
![Page 5: Moving from C#/.NET to Hadoop/MongoDB](https://reader036.vdocuments.net/reader036/viewer/2022062617/54c31af04a7959c1668b4569/html5/thumbnails/5.jpg)
![Page 6: Moving from C#/.NET to Hadoop/MongoDB](https://reader036.vdocuments.net/reader036/viewer/2022062617/54c31af04a7959c1668b4569/html5/thumbnails/6.jpg)
© 2012 Rovi Corporation. Company confidential.11
![Page 7: Moving from C#/.NET to Hadoop/MongoDB](https://reader036.vdocuments.net/reader036/viewer/2022062617/54c31af04a7959c1668b4569/html5/thumbnails/7.jpg)
© 2012 Rovi Corporation. Company confidential.13
The Problem
![Page 8: Moving from C#/.NET to Hadoop/MongoDB](https://reader036.vdocuments.net/reader036/viewer/2022062617/54c31af04a7959c1668b4569/html5/thumbnails/8.jpg)
ETL/Cache Loading Data Takes Too Long
Page 16
Cache Loading Process
DSG DB Server(s)
DSG Database
WSP ETL Server
CI DatabaseDatabase TransformExtract
Node 2 DB Server
CI Database
Backup & Restore
MemcacheD Cluster
MemcacheD
MemcacheD
Node 1 DB Server
CI Database
Backup & Restore
MemcacheDB Cluster
MemcacheDB
MemcacheDB
MemcacheD (Scratch Server(s))
MemcacheD
MemcacheD
Table Loading Process
![Page 9: Moving from C#/.NET to Hadoop/MongoDB](https://reader036.vdocuments.net/reader036/viewer/2022062617/54c31af04a7959c1668b4569/html5/thumbnails/9.jpg)
© 2012 Rovi Corporation. Company confidential.17
The Solution
![Page 10: Moving from C#/.NET to Hadoop/MongoDB](https://reader036.vdocuments.net/reader036/viewer/2022062617/54c31af04a7959c1668b4569/html5/thumbnails/10.jpg)
Hadoop/MongoDB
Copyright ®2012 Rovi Corporation. Company confidential.18
![Page 11: Moving from C#/.NET to Hadoop/MongoDB](https://reader036.vdocuments.net/reader036/viewer/2022062617/54c31af04a7959c1668b4569/html5/thumbnails/11.jpg)
Network Diagram
Copyright ®2012 Rovi Corporation. Company confidential.20
![Page 12: Moving from C#/.NET to Hadoop/MongoDB](https://reader036.vdocuments.net/reader036/viewer/2022062617/54c31af04a7959c1668b4569/html5/thumbnails/12.jpg)
Mongo Sharding
Copyright ®2012 Rovi Corporation. Company confidential.21
![Page 13: Moving from C#/.NET to Hadoop/MongoDB](https://reader036.vdocuments.net/reader036/viewer/2022062617/54c31af04a7959c1668b4569/html5/thumbnails/13.jpg)
© 2012 Rovi Corporation. Company confidential.23
Challenges
![Page 14: Moving from C#/.NET to Hadoop/MongoDB](https://reader036.vdocuments.net/reader036/viewer/2022062617/54c31af04a7959c1668b4569/html5/thumbnails/14.jpg)
Challenges
• Transition existing Windows/.NET team to Linux/Java
– Environment setup. Technology framework choices
– Coding differences
– Cultural differences
– Platform differences
– Easier than expected to transition team from .NET to Java – No religious battles
• Backwards compatibility of CXF web services to Microsoft .NET web services
• Managing new releases of Hadoop
• BCP took too long
– Converted to base tables. Used Pig to join the data
• Writes to Mongo are very fast. Updates are slower and saturated disks
– Implemented Diff process (MD5 calc) to allow Hadoop to do the work and minimize writes to Mongo
© 2012 Rovi Corporation. Company confidential.24
![Page 15: Moving from C#/.NET to Hadoop/MongoDB](https://reader036.vdocuments.net/reader036/viewer/2022062617/54c31af04a7959c1668b4569/html5/thumbnails/15.jpg)
Lessons Learned
© 2012 Rovi Corporation. Company confidential.25
![Page 16: Moving from C#/.NET to Hadoop/MongoDB](https://reader036.vdocuments.net/reader036/viewer/2022062617/54c31af04a7959c1668b4569/html5/thumbnails/16.jpg)
Lessons Learned
• General
– Current versions of Hadoop CDH4 and MongoDB 2.0 are actually very stable products• We purchased enterprise support agreements from both Cloudera and 10gen
– Create a developers VM image
– Deploy early and often even if not ready for real customers
– Use the same setup in test and production environments• Sharding caused differences
• SQL
– Get raw tables without any transformation or joins• Let Hadoop do the processing for you
• Hadoop
– Do as much work as you can in Hadoop
– Take the time to create small datasets to iterate fast
– Take the time to learn and use Pig• It is very fast and provides tons of functionality that you don’t need to code in Java
– Don’t create Runners - Use Oozie workflows
– Measure, benchmark and track performance – Use Hadoop counters
© 2012 Rovi Corporation. Company confidential.26
![Page 17: Moving from C#/.NET to Hadoop/MongoDB](https://reader036.vdocuments.net/reader036/viewer/2022062617/54c31af04a7959c1668b4569/html5/thumbnails/17.jpg)
Lessons Learned - 2
• MongoDB
– RAM, RAM, RAM!!!
– Many writes from Hadoop can easily overwhelm MongoDB• Single database lock
• Drive bandwidth saturation – Can be expanded through sharding
• Do as much as possible to minimize writes
• Measure where your application is blocking and optimize
– Don’t shard unless you have to – if you do shard, preconfigure your shard key• You need a good shard key
– Use Replica sets. They are easy to setup and work good.• Make sure repllog is large enough.
– Use MongoDB Monitoring Service (MMS) – It’s free
– Mongo queries are fast!
© 2012 Rovi Corporation. Company confidential.27
![Page 18: Moving from C#/.NET to Hadoop/MongoDB](https://reader036.vdocuments.net/reader036/viewer/2022062617/54c31af04a7959c1668b4569/html5/thumbnails/18.jpg)
Mongo Query – returns 90 rows from a database of 9 million in 44ms
© 2012 Rovi Corporation. Company confidential.28
![Page 19: Moving from C#/.NET to Hadoop/MongoDB](https://reader036.vdocuments.net/reader036/viewer/2022062617/54c31af04a7959c1668b4569/html5/thumbnails/19.jpg)
31 © 2012 Rovi Corporation. Company confidential.
Q&A
![Page 20: Moving from C#/.NET to Hadoop/MongoDB](https://reader036.vdocuments.net/reader036/viewer/2022062617/54c31af04a7959c1668b4569/html5/thumbnails/20.jpg)
Follow-up Information
• Email: [email protected]
• LinkedIn: http://www.linkedin.com/in/bvandehey
• Twitter: @bvandehey
• Rovi Cloud Services: http://developer.rovicorp.com/
© 2012 Rovi Corporation. Company confidential.32
![Page 21: Moving from C#/.NET to Hadoop/MongoDB](https://reader036.vdocuments.net/reader036/viewer/2022062617/54c31af04a7959c1668b4569/html5/thumbnails/21.jpg)
Thank You
33 © 2012 Rovi Corporation. Company confidential.