lean & agile with mongodb
DESCRIPTION
My slides for lean & agile development with MongoDB from MongoMunich 2012. The talk shows how you can speed up development with NoSQL technologies and also gives some samples from a big data projectTRANSCRIPT
Lean & Agile with MongoDB
MongoMunich 2012
#MongoDBMunich@comsysto
Monday, October 15, 12
About us
2
Monday, October 15, 12
About us
• first partner of 10gen in Germany (January 2012)
3
Monday, October 15, 12
About me
• Lead DevOps Engineer @comsysto• @loomit• Data Nerd• 3 years of high performance web ops• joined comSysto in March 2012
4
Monday, October 15, 12
Questions
• Please ask during the presentation!
5
Monday, October 15, 12
Lean?
6
Monday, October 15, 12
Lean?
7Continuous InnovationMonday, October 15, 12
Lean?
• Instant feedback from customers about features
• eliminate waste
8
Monday, October 15, 12
Eliminate waste
9
Monday, October 15, 12
Agile?
• Iterative and incremental
10
Monday, October 15, 12
SCRUM
• Scrum is a framework for developing and sustaining complex products
11
Monday, October 15, 12
Kanban
• Pull from a work queue• originated at Toyota in the 1950s
12
Monday, October 15, 12
Agile Adoption
• Ken Schwaber
13
Monday, October 15, 12
Agile Adoption
• “There is no SCRUM police”
14
Monday, October 15, 12
Agile Adoption
• “Use your intelligence”
15
Monday, October 15, 12
Agile Adoption
• Dogmatic Slumber16
Monday, October 15, 12
Don’t be the little girl
17
Monday, October 15, 12
Don’t be the Joker
18
Monday, October 15, 12
Cross functional teams
19
Monday, October 15, 12
Cross functional teams
20
Monday, October 15, 12
8 hats
21
Monday, October 15, 12
Co-location
22
Monday, October 15, 12
Appreciation for simplicity
• “Everything should be as simple as possible, but not simpler”
• paraphrased Albert Einstein23
Monday, October 15, 12
Look familiar?
24
Monday, October 15, 12
NOSQL
25
Monday, October 15, 12
Schema Free
26
“Your data schema is a direct corollary with how you view your business’ direction and tech goals. When you pivot, especially if it’s a significant one, your data may no longer make sense in the context of that change. Give yourself room to breath. A schema-less data model is MUCH easier to adapt to rapidly changing requirements than a highly structured, rigidly enforced schema.”
from:http://www.cleverkoala.com/2010/08/why-your-startup-should-be-using-mongodb/
Monday, October 15, 12
Emergent Architectures
27
Monday, October 15, 12
Move fast and break things
28
Monday, October 15, 12
NOSQL
29
Monday, October 15, 12
Scale out
30
Monday, October 15, 12
AWS
• MongoDB mostly I/O bound• Storage matters
31
Monday, October 15, 12
AWS
• EBS (anywhere from 70 to 300 ops/sec)• EBS provisioned IOPS (stable)• Ephemeral • SSD (much higher ops/sec but costly)• use RAID on EC2 (or not?)
32
Monday, October 15, 12
MongoDB AWS Storage
33
Monday, October 15, 12
AWS
• Naming really matters – combine with Route 53– ec2-174-129-227-92.compute-1.amazonaws.com?
34
Monday, October 15, 12
Sharded Setup
35
Monday, October 15, 12
MongoDB on AWS
36
Monday, October 15, 12
Infrastructure as code
37
Monday, October 15, 12
Use Cases
• Real-Time Analytics Software
• Operational Intelligence
• High Volume Data Feeds
• Hadoop
38
Monday, October 15, 12
Patterns
• Pre Aggregation• Batch
– Hadoop – MapReduce (in MongoDB)– Aggregation Framework
39
Monday, October 15, 12
Pre-Aggregation
• Problem:– You require up-to-the minute data, or up-to-the-second if
possible– The queries for ranges of data (by time) must be as fast as
possible
40
Monday, October 15, 12
Pre-Aggregation
• Best practises– $inc and upsert are your friend– pre-allocate documents– use REST interface
41
Monday, October 15, 12
Batch
• MapReduce• Aggregation Framework• Mongo-Hadoop Connector
42
Monday, October 15, 12
Mongo Hadoop Connector
43
Data Storage Data Processing
Monday, October 15, 12
Projects
• What we have done so far...
44
Monday, October 15, 12
Real Time Twitter Heatmap
45
Monday, October 15, 12
Real Time Twitter Heatmap
• The bubbles in the sea?
Friendly Floatees!
46
Monday, October 15, 12
Friendly Floatees
47
Monday, October 15, 12
Flow
48
Monday, October 15, 12
Real Time Twitter Heatmap
• MongoDB Capped Collections• Flask• Redis• Google Maps• heatmaps.js• Server-Sent Events• http://bit.ly/Ou5SsP
49
Monday, October 15, 12
Pizza Quattro Shardoni
50
Monday, October 15, 12
Quattro Shardoni
• Technology Showcase Product• Complete End2End stack• Real Time Charting• Batch Reporting based on Hadoop
51
Monday, October 15, 12
Quattro Shardoni
52
Monday, October 15, 12
Quattro Shardoni
53
Monday, October 15, 12
Quattro Shardoni
54
Monday, October 15, 12
Quattro Shardoni
• Vortrag heute 12:15 BallSaal A
55Tom Zorc Bernd ZutherMonday, October 15, 12
Operational Intelligence
56
Monday, October 15, 12
Operational Intelligence
• Analyze behavior of users in web shop• Recommend NBA for business• Real Time Analytics
57
Monday, October 15, 12
Online Shop
58
REST
Monday, October 15, 12
Operational Intelligence
• Next best activity for support/callcenter• interpret user session • e.g. “RaspberryPi - strong interest”• exp. 2000 events per seconds
59
Monday, October 15, 12
Operational Intelligence
60
Monday, October 15, 12
Operational Intelligence
61
Monday, October 15, 12
It’s Real Time!
62
Monday, October 15, 12
Big Data Project
• “which analyzes and visualizes data of mobile networks”
63
Monday, October 15, 12
Big Data Project
64
Monday, October 15, 12
Big Data Project
65
Monday, October 15, 12
Big Data Project
66
Monday, October 15, 12
Big Data Project
• started as prototype in production now ;-)
66
Monday, October 15, 12
Big Data Project
• started as prototype in production now ;-)• “beyond agile”
66
Monday, October 15, 12
Big Data Project
• started as prototype in production now ;-)• “beyond agile”• going from
66
Monday, October 15, 12
Big Data Project
• started as prototype in production now ;-)• “beyond agile”• going from
– fetch all, calculate in service layer
66
Monday, October 15, 12
Big Data Project
• started as prototype in production now ;-)• “beyond agile”• going from
– fetch all, calculate in service layer– use MongoDB MapReduce on single node
66
Monday, October 15, 12
Big Data Project
• started as prototype in production now ;-)• “beyond agile”• going from
– fetch all, calculate in service layer– use MongoDB MapReduce on single node– use MongoDB MapReduce on 5 shards
66
Monday, October 15, 12
Big Data Project
• started as prototype in production now ;-)• “beyond agile”• going from
– fetch all, calculate in service layer– use MongoDB MapReduce on single node– use MongoDB MapReduce on 5 shards– use MongoDB MapReduce on 24 shards (2
hi1.4xlarge instances)
66
Monday, October 15, 12
Big Data Project
• started as prototype in production now ;-)• “beyond agile”• going from
– fetch all, calculate in service layer– use MongoDB MapReduce on single node– use MongoDB MapReduce on 5 shards– use MongoDB MapReduce on 24 shards (2
hi1.4xlarge instances)– use EMR (around 10 m2.4xlarge instances)
66
Monday, October 15, 12
Big Data Project
67
Monday, October 15, 12
Big Data Project
68
Monday, October 15, 12
Big Data Project
• why not use Aggregation Framework?– we started with 2.0.6– would have had to change data model– M/R seemed the way to go (data size)
69
Monday, October 15, 12
Big Data Project
• Numbers– data comes in weekly increments– xTB raw data– 14GB / week (into MongoDB)– data grows in direct proportion to polygon count– currently 1 replica set of 3 m2.4xlarge instances
70
Monday, October 15, 12
MongoDB on AWS
71
Monday, October 15, 12
Big Data Project
• Geo Spatial Features– $within queries (bounding box)– $near queries
72
Monday, October 15, 12
Big Data Project
73
Monday, October 15, 12
Big Data Project
74
Raw Data MapReduce
Monday, October 15, 12
Big Data Project
• more polygons -> more data – key length can become an issue
• using polygons to display cell metrics• tried different types of visualizations
75
Monday, October 15, 12
Big Data Project
• key-size per doc: 1.8KB– bad: {very_descriptive_long_key : “yay”}– good { v : “yay”}
76
Monday, October 15, 12
Big Data Project
77
0 100.0 200.0 300.0 400.0
62
308
GB / year
100000 polygons 500000 polygons
Monday, October 15, 12
Big Data Project
78
Monday, October 15, 12
Big Data Project
• 308GB of EBS storage => 332$ per year– backups / snapshot not considered
79
Monday, October 15, 12
Big Data Project
• Future Plans– new Use Case– expecting about 1TB of data / week
80
Monday, October 15, 12
Conclusion
• rapidly changing business needs• ease of collecting huge amounts of data• infrastructure as part of code• MongoDB provides flexibility
81
Monday, October 15, 12
Comments?
• @comsysto• #MongoMunich2012• http://blog.comsysto.com• Don’t forget the hallway track• Mongo User Group Munich
– http://www.meetup.com/Muenchen-MongoDB-User-Group/
82
Monday, October 15, 12
• http://careers.comsysto.com
83
We are hiring!
Monday, October 15, 12
Lean & Agile with MongoDB
MongoMunich 2012
#MongoDBMunich@comsysto
Monday, October 15, 12