running & monitoring docker at scale
DESCRIPTION
Containerization (à la Docker) is increasing the elastic nature of cloud infrastructure by an order of magnitude. If you have adopted Docker, or are considering it, you are probably facing questions like: - How many containers can you run on a given Amazon EC2 instance type? - Which metric should you look at to measure contention? - How do you manage fleets of containers at scale? Datadog’s CTO, Alexis Lê-Quôc, presents the challenges and benefits of running Docker containers at scale. Alexis explains how to use quantitative performance patterns to monitor your infrastructure at the new level of magnitude and increased complexity introduced by containerization.TRANSCRIPT
![Page 1: Running & Monitoring Docker at Scale](https://reader033.vdocuments.net/reader033/viewer/2022052911/559df2201a28ab387d8b456b/html5/thumbnails/1.jpg)
© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
November 12th, 2014 | Las Vegas
Monitoring and Running Docker Containers at Scale Alexis Lê-Quôc, Datadog
![Page 2: Running & Monitoring Docker at Scale](https://reader033.vdocuments.net/reader033/viewer/2022052911/559df2201a28ab387d8b456b/html5/thumbnails/2.jpg)
@alq — CTO at Datadog
![Page 3: Running & Monitoring Docker at Scale](https://reader033.vdocuments.net/reader033/viewer/2022052911/559df2201a28ab387d8b456b/html5/thumbnails/3.jpg)
Datadog
• Monitoring service • Made for the cloud • Aggregates everything • Support for Docker
(since 1.0)
![Page 4: Running & Monitoring Docker at Scale](https://reader033.vdocuments.net/reader033/viewer/2022052911/559df2201a28ab387d8b456b/html5/thumbnails/4.jpg)
Goals 1. Present key Docker metrics 2. Explain operational complexity 3. Rethink monitoring of Docker containers
![Page 5: Running & Monitoring Docker at Scale](https://reader033.vdocuments.net/reader033/viewer/2022052911/559df2201a28ab387d8b456b/html5/thumbnails/5.jpg)
Agenda • A (very) brief history of containers • Docker containers on AWS • Key Docker metrics • Operational complexity • Monitoring Docker effectively
• Demo
![Page 6: Running & Monitoring Docker at Scale](https://reader033.vdocuments.net/reader033/viewer/2022052911/559df2201a28ab387d8b456b/html5/thumbnails/6.jpg)
A brief history of containers
![Page 7: Running & Monitoring Docker at Scale](https://reader033.vdocuments.net/reader033/viewer/2022052911/559df2201a28ab387d8b456b/html5/thumbnails/7.jpg)
Containers in a nutshell • Been around for a long time
– jails, zones, cgroups • No full-virtualization overhead • Used for runtime isolation (e.g. jails) • Docker: escape from dependency hell
![Page 8: Running & Monitoring Docker at Scale](https://reader033.vdocuments.net/reader033/viewer/2022052911/559df2201a28ab387d8b456b/html5/thumbnails/8.jpg)
Escape from dependency hell a.out
shared libs
packages
omnibus
Docker ~
![Page 9: Running & Monitoring Docker at Scale](https://reader033.vdocuments.net/reader033/viewer/2022052911/559df2201a28ab387d8b456b/html5/thumbnails/9.jpg)
Container ~ single static binary Process Container Host
Source Dockerfile Chef/Puppet Kickstart
.TEXT /var/lib/docker Full distro
PID Name/ID Hostname
![Page 10: Running & Monitoring Docker at Scale](https://reader033.vdocuments.net/reader033/viewer/2022052911/559df2201a28ab387d8b456b/html5/thumbnails/10.jpg)
Docker on AWS: some numbers
![Page 11: Running & Monitoring Docker at Scale](https://reader033.vdocuments.net/reader033/viewer/2022052911/559df2201a28ab387d8b456b/html5/thumbnails/11.jpg)
(Some) Docker use cases • Continous integration
– eliminate dependency variance – same code from dev laptop to production – git-like workflow
• Continuous delivery – (quasi) stateless components – web workers, video encoders, etc. – not for data stores (Amazon RDS a better fit)
![Page 12: Running & Monitoring Docker at Scale](https://reader033.vdocuments.net/reader033/viewer/2022052911/559df2201a28ab387d8b456b/html5/thumbnails/12.jpg)
Instance types
20% 20% 19%
13%
8%
21%
c3.2xl m3.medium m3.large m3.xlarge m1.large the rest
Source: Datadog, October 2014
![Page 13: Running & Monitoring Docker at Scale](https://reader033.vdocuments.net/reader033/viewer/2022052911/559df2201a28ab387d8b456b/html5/thumbnails/13.jpg)
Containers per instance • Average: 5 (October 2014) • Highly dependent on the workload • This is just the beginning… • Expect higher container density going forward
Source: Datadog, October 2014
![Page 14: Running & Monitoring Docker at Scale](https://reader033.vdocuments.net/reader033/viewer/2022052911/559df2201a28ab387d8b456b/html5/thumbnails/14.jpg)
Key Docker metrics
![Page 15: Running & Monitoring Docker at Scale](https://reader033.vdocuments.net/reader033/viewer/2022052911/559df2201a28ab387d8b456b/html5/thumbnails/15.jpg)
Monitoring fundamentals Work
Resource consumption
Measures the amount of value created
Measures the amount of resources consumed to create value
What your customers care about What your customers don’t care about
Database: queries answered Web server: requests served Queue: wait time distribution
Database: I/O throughput Web server: active connections OS: CPU utilization Container: memory footprint
![Page 16: Running & Monitoring Docker at Scale](https://reader033.vdocuments.net/reader033/viewer/2022052911/559df2201a28ab387d8b456b/html5/thumbnails/16.jpg)
Docker containers consume… • Memory • CPU • I/O • Network
![Page 17: Running & Monitoring Docker at Scale](https://reader033.vdocuments.net/reader033/viewer/2022052911/559df2201a28ab387d8b456b/html5/thumbnails/17.jpg)
Memory Name Why it matters
pgmajfault Paging to/from disk is slow
pgfault Context switches hurt application performance
resident set size (rss) Too much RSS causes paging and swapping
swap Swapping in/out is slow
![Page 18: Running & Monitoring Docker at Scale](https://reader033.vdocuments.net/reader033/viewer/2022052911/559df2201a28ab387d8b456b/html5/thumbnails/18.jpg)
CPU Name Why it matters
user Measures work being done
system System calls, a necessary evil
![Page 19: Running & Monitoring Docker at Scale](https://reader033.vdocuments.net/reader033/viewer/2022052911/559df2201a28ab387d8b456b/html5/thumbnails/19.jpg)
Block I/O Name Why it matters
blkio.io_service_bytes I/O is (often) bottleneck
blkio.io_queued Measures saturation
![Page 20: Running & Monitoring Docker at Scale](https://reader033.vdocuments.net/reader033/viewer/2022052911/559df2201a28ab387d8b456b/html5/thumbnails/20.jpg)
Network Name Why it matters
tx/rx_errors Because… errors are bad.
tx/rx_dropped Measures contention
tx/rx_bytes Measures traffic
![Page 21: Running & Monitoring Docker at Scale](https://reader033.vdocuments.net/reader033/viewer/2022052911/559df2201a28ab387d8b456b/html5/thumbnails/21.jpg)
How to collect metrics • https://github.com/google/cadvisor
![Page 22: Running & Monitoring Docker at Scale](https://reader033.vdocuments.net/reader033/viewer/2022052911/559df2201a28ab387d8b456b/html5/thumbnails/22.jpg)
Operational complexity
![Page 23: Running & Monitoring Docker at Scale](https://reader033.vdocuments.net/reader033/viewer/2022052911/559df2201a28ab387d8b456b/html5/thumbnails/23.jpg)
Combinatorial multiplication
Hardware
OS
Off-the-shelf
Your Application
Hardware
Hypervisor
Off-the-shelf
App
OS OS
Off-the-shelf
App
Hardware
Hypervisor
OS OS
A A A A
Containers
O O O O
![Page 24: Running & Monitoring Docker at Scale](https://reader033.vdocuments.net/reader033/viewer/2022052911/559df2201a28ab387d8b456b/html5/thumbnails/24.jpg)
Operational complexity • Average containers per instance: N (N=5, 10/2014) • N-times as many “hosts” to manage • Affects
– provisioning: prep’ing & building containers – configuration: passing config to containers – orchestration: deciding where/when containers run – monitoring: making sure containers run properly
![Page 25: Running & Monitoring Docker at Scale](https://reader033.vdocuments.net/reader033/viewer/2022052911/559df2201a28ab387d8b456b/html5/thumbnails/25.jpg)
Monitoring: metric counts on Amazon EC2
• 1 Amazon EC2 instance – 10 CloudWatch metrics
• 1 operating system (e.g. linux) – 100 metrics
• 1 Container – 50 metrics
• 1 off-the-shelf application – ~50 metrics
![Page 26: Running & Monitoring Docker at Scale](https://reader033.vdocuments.net/reader033/viewer/2022052911/559df2201a28ab387d8b456b/html5/thumbnails/26.jpg)
Combinatorial multiplication
100 500 instances containers
Assuming only 5 containers per instance
![Page 27: Running & Monitoring Docker at Scale](https://reader033.vdocuments.net/reader033/viewer/2022052911/559df2201a28ab387d8b456b/html5/thumbnails/27.jpg)
Combinatorial multiplication
160 410 metrics per instance
metrics per instance
Assuming only 5 containers per instance
![Page 28: Running & Monitoring Docker at Scale](https://reader033.vdocuments.net/reader033/viewer/2022052911/559df2201a28ab387d8b456b/html5/thumbnails/28.jpg)
Velocity
hours, days, months
minutes, hours, days
EC2 instance half-life Container half-life
![Page 29: Running & Monitoring Docker at Scale](https://reader033.vdocuments.net/reader033/viewer/2022052911/559df2201a28ab387d8b456b/html5/thumbnails/29.jpg)
Aggravating factors • Hub-based provisioning
– new images every day • Autonomic orchestration
– from imperative to declarative – automated – individual containers don’t matter – e.g. kubernetes, mesos
![Page 30: Running & Monitoring Docker at Scale](https://reader033.vdocuments.net/reader033/viewer/2022052911/559df2201a28ab387d8b456b/html5/thumbnails/30.jpg)
A lot more, A lot faster.
![Page 31: Running & Monitoring Docker at Scale](https://reader033.vdocuments.net/reader033/viewer/2022052911/559df2201a28ab387d8b456b/html5/thumbnails/31.jpg)
If your monitoring is still centered on individual hosts or instances…
![Page 32: Running & Monitoring Docker at Scale](https://reader033.vdocuments.net/reader033/viewer/2022052911/559df2201a28ab387d8b456b/html5/thumbnails/32.jpg)
Host-centric monitoring
Monitor
Monitor
GAP
Hypervisor
OS OS
A A A A
Containers
O O O O
![Page 33: Running & Monitoring Docker at Scale](https://reader033.vdocuments.net/reader033/viewer/2022052911/559df2201a28ab387d8b456b/html5/thumbnails/33.jpg)
A lot more pain, A lot faster.
![Page 34: Running & Monitoring Docker at Scale](https://reader033.vdocuments.net/reader033/viewer/2022052911/559df2201a28ab387d8b456b/html5/thumbnails/34.jpg)
Monitoring containers effectively
![Page 35: Running & Monitoring Docker at Scale](https://reader033.vdocuments.net/reader033/viewer/2022052911/559df2201a28ab387d8b456b/html5/thumbnails/35.jpg)
A new approach to container monitoring
![Page 36: Running & Monitoring Docker at Scale](https://reader033.vdocuments.net/reader033/viewer/2022052911/559df2201a28ab387d8b456b/html5/thumbnails/36.jpg)
Layers + Tags
![Page 37: Running & Monitoring Docker at Scale](https://reader033.vdocuments.net/reader033/viewer/2022052911/559df2201a28ab387d8b456b/html5/thumbnails/37.jpg)
Layers of monitoring
Monitor
Hypervisor
OS OS
A A A A
Containers
O O O O
![Page 38: Running & Monitoring Docker at Scale](https://reader033.vdocuments.net/reader033/viewer/2022052911/559df2201a28ab387d8b456b/html5/thumbnails/38.jpg)
Layers of monitoring
CloudWatch
Infrastructure Monitoring
APM
Hypervisor
OS OS
A A A A
Containers
O O O O
![Page 39: Running & Monitoring Docker at Scale](https://reader033.vdocuments.net/reader033/viewer/2022052911/559df2201a28ab387d8b456b/html5/thumbnails/39.jpg)
Layers of monitoring
cpu/net/io
filesystem docker mem docker cpu db queries
web requests
app throughput
CloudWatch
Infrastructure Monitoring
APM
e.g.
Hypervisor
OS OS
A A A A
Containers
O O O O
![Page 40: Running & Monitoring Docker at Scale](https://reader033.vdocuments.net/reader033/viewer/2022052911/559df2201a28ab387d8b456b/html5/thumbnails/40.jpg)
Layers of monitoring • Access to metrics from all the layers • Amazon CloudWatch, OS metrics, Docker metrics,
app metrics in 1 place • Shared timeline
![Page 41: Running & Monitoring Docker at Scale](https://reader033.vdocuments.net/reader033/viewer/2022052911/559df2201a28ab387d8b456b/html5/thumbnails/41.jpg)
If your monitoring does not cover all layers, pain.
![Page 42: Running & Monitoring Docker at Scale](https://reader033.vdocuments.net/reader033/viewer/2022052911/559df2201a28ab387d8b456b/html5/thumbnails/42.jpg)
Tags
You use them already
![Page 43: Running & Monitoring Docker at Scale](https://reader033.vdocuments.net/reader033/viewer/2022052911/559df2201a28ab387d8b456b/html5/thumbnails/43.jpg)
Tags • Monitoring is like Auto-Scaling Groups • Monitoring is like Docker orchestration • From imperative to declarative • Query-based • Queries operate on tags
![Page 44: Running & Monitoring Docker at Scale](https://reader033.vdocuments.net/reader033/viewer/2022052911/559df2201a28ab387d8b456b/html5/thumbnails/44.jpg)
Monitoring with tags and queries
“Monitor all Docker containers running image web” “… in region us-west-2 across all availability zones” “… and make sure resident set size < 1GB on c3.xl”
![Page 45: Running & Monitoring Docker at Scale](https://reader033.vdocuments.net/reader033/viewer/2022052911/559df2201a28ab387d8b456b/html5/thumbnails/45.jpg)
Monitoring with tags and queries
“Monitor all Docker containers running image web” “… in region us-west-2 across all availability zones” “… and make sure resident set size < 1GB on c3.xl”
![Page 46: Running & Monitoring Docker at Scale](https://reader033.vdocuments.net/reader033/viewer/2022052911/559df2201a28ab387d8b456b/html5/thumbnails/46.jpg)
Monitoring with tags and queries
“Monitor all Docker containers running image web” “… in region us-west-2 across all availability zones” “… that use more than 1.5x the average on c3.xl”
![Page 47: Running & Monitoring Docker at Scale](https://reader033.vdocuments.net/reader033/viewer/2022052911/559df2201a28ab387d8b456b/html5/thumbnails/47.jpg)
“Dude, where’s my server?”
![Page 48: Running & Monitoring Docker at Scale](https://reader033.vdocuments.net/reader033/viewer/2022052911/559df2201a28ab387d8b456b/html5/thumbnails/48.jpg)
“Dude, where’s my container?”
![Page 49: Running & Monitoring Docker at Scale](https://reader033.vdocuments.net/reader033/viewer/2022052911/559df2201a28ab387d8b456b/html5/thumbnails/49.jpg)
If your monitoring is not tag-based, pain.
![Page 50: Running & Monitoring Docker at Scale](https://reader033.vdocuments.net/reader033/viewer/2022052911/559df2201a28ab387d8b456b/html5/thumbnails/50.jpg)
Demo
![Page 51: Running & Monitoring Docker at Scale](https://reader033.vdocuments.net/reader033/viewer/2022052911/559df2201a28ab387d8b456b/html5/thumbnails/51.jpg)
Take-aways 1. Docker increases operational complexity by an order
of magnitude unless… 2. You have layered monitoring, from the instance to
the container and to the application, and… 3. You monitor using tags and queries
![Page 52: Running & Monitoring Docker at Scale](https://reader033.vdocuments.net/reader033/viewer/2022052911/559df2201a28ab387d8b456b/html5/thumbnails/52.jpg)
Please give us your feedback on this presentation
© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
Join the conversation on Twitter with #reinvent