using the mongodb monitoring service (mms)
DESCRIPTION
This talk will cover MMS - the MongoDB monitoring system. MMS is a Free MongoDB monitoring Saas solution built by 10gen and used by many MongoDB users. Monitoring is a necessary activity for any production database system to detect upcoming or ongoing issues. In addition it gives an insight on all the vitals of your system and can help detect bottlenecks and inefficiencies for improved performance. This talk will focus on: - what is MMS and how to get started - understanding each metric and graph - what are signs of trouble, when to take actions or panic - what are signs that your hardware ressources are not properly used - how did we build MMS, the high performance time series systemTRANSCRIPT
Engineer, 10gen
Mark Hillick - @markofu
#mongosv
Using the MongoDB Monitoring Service (MMS)
What, where, numbers?
What is MMS?
• MongoDB monitoring Saas solution with:
– Per minute granularity
– Alerting: host up / down, metrics etc
– Event tracking (server restart, step down, …)
• Host management (auto discover)
• Profiling
• Hardware stats also
Why use MMS? (1)
• Overview – Bird’s Eye
– Macro
• Drill down (minute by minute)
– Micro
Why use MMS? (2)
• Haz all teh things
• Tailored specifically for MongoDB
• Incredibly helpful for 10gen Support when troubleshooting
A few numbers …
• Monitors over 19k database servers
• 40k writes per second
• 400 metrics per ping packet
• 9 billion metrics recorded per day
How?
Set up MMS – it’s easy
• Go to http://mms.10gen.com
– Create a new account or sign in with jira user.
– Pick an explicit company name
– Download and run the agent
– From MMS dashboard, add a host to monitor
The MMS client (agent)
• Small Python app
• A single agent process
– Failover – multiple agents
• Connect to mms.10gen.com (SSL over TCP 443)
Host
Operational Stats
Alerting
Alerts - Config
All good
Alerts - Closed
Events
Security
Security
• Purely stats (metadata). – Log transfer has to be turned on.
• HTTPS & connections are outbound only (from the agent)
• If profiling in db & MMS, then profiling data is sent
On-premise MMS
• Locally Hosted in Customer Infrastructure
• PCI, HIPAA etc
• Enterprise Customers (2.4)
Measure me!!!
Metrics
• Source : http://www.kaushik.net/avinash/wp-content/uploads/2007/10/metrics.jpg
opcounters• Count of every operation per second
• getMore – each batch of a query
memory• Mapped: sum of files on disk
• Virtual memory: 2 x mapped (j) + process overhead
• Resident memory: data in RAM actively used
Lock %• Amount of time spent in the write lock
• From 2.2 : each db has own lock
Background flush• Flush every 60 seconds
• Watch: if flush time gets close to sync delay
Page faults• Disk IO
• Readahead
Replication• On primary: amount of time in oplog
• On secondary: replication delay to primary
Metrics that we discussed• Opcounters
• Lock %
• Background Flush
• Page Faults
• Replication
Metrics for performance
• Resident memory: how much data in RAM?
• Page Faults: paging to disk? Readahead?
• Journal commits in write lock: separate journal
• High background flush: reduce sync delay to smooth
Documentation
Docs? Where?
• Manual : https://mms.10gen.com/help/
– Web– PDF
• FAQ : https://mms.10gen.com/docs/faq
• Blah
Futures
Feature Request
• JIRA Ticket - MMSSUPPORT
Coming up…
• Data visualization, e.g. shard distribution (Q1 2013)???
• Move from Python to Java
• Blah – Ryan???
Conclusion
Conclusion
• Easy to use
• Macro & micro
• Detailed monitoring features
• Aides 10gen Support immensely
Engineer, 10gen
Mark Hillick - @markofu
#mongosv
Questions?