monitoring docker containers and dockerized applications

23
Monitoring Docker Containers & Dockerized Applications Anantha Padmanabhan CB (@cbananth) Rahul Krishna Upadhyaya (@rakrup_) Satya Sanjivani Routray (@er_sanj007) Meenakshi Sundaram Lakshmanan (@lxmeenakshi1) Cloud and Network Solutions Cisco Systems Inc.

Upload: satya-sanjibani-routray

Post on 16-Apr-2017

152 views

Category:

Design


0 download

TRANSCRIPT

PowerPoint Presentation

Monitoring Docker Containers&Dockerized ApplicationsAnantha Padmanabhan CB (@cbananth)Rahul Krishna Upadhyaya (@rakrup_)Satya Sanjivani Routray (@er_sanj007)Meenakshi Sundaram Lakshmanan (@lxmeenakshi1)

Cloud and Network SolutionsCisco Systems Inc.

AgendaIntroductionMonitoring Containers - Challenges ApproachDesignDemoQ&A

Containers IntroductionContainers virtualize the OS just like hypervisors virtualizes the hardware

Containers enable any payload to be encapsulated as a lightweight, Portable self-sufficient container, that can be manipulated using standard operations and run consistently on any hardware platform.

Wraps up a piece of software in a complete filesystem that contains everything it needs to run such as : code, runtime, system tools, libraries etc., they share the OS kernel and bins/libs where needed, otherwise each of them operate in a self contained environment.

Containers IntroductionDocker, LXCs are some of the most popular implementations of containers today.

Can be run on any Linux Server - VMs, physical Hosts, openstack..

Ability to move around between machines without any modification

Ability of containers to work together.

Monitoring Containers - Challenges Traditionally Monitoring brings to mind, Monitoring of the infrastructure Server, Networks and Monitoring the Apps which run on them.In the world of containers monitoring infrastructure alone or Application alone may not be able to provide the full picture. Complete Monitoring = (App + software defined components/devices + Infra) Challenges with the monitoring tools are Vast set of monitoring tools to collect various statisticsEach tool gives different set of attributes in different formatData collection tools may tend to overload the container itself, making the statistics inaccurate.Differentiating metrics for containers that are related and share resourcesMore than everything, lot of computation is required to come up with meaningful inferences from all the data that is collected

Monitoring Containers - ChallengesCategorizing container utilization and statistics for multitenant applications is complexDifferent applications provide different format of logsIdentifying failure points of applicationsAnalyzing the interconnectivity between applications in different containers, hosts or regions.Assessing the response time of application is complicated in a web based cloud application, since there are lot of other parameters (region, internet speed) which could influence response timeClustered applications might require monitoring all the instances to identify the faulty node

Monitoring Containers - ApproachApps are embedded within the containers which are in turn within a VM or physical hostContainerization requires monitoring at these different levels in order to collect complete statisticsContainers can be linked ability to monitor and make sense of statistics from linked containers becomes critical.Ability to intelligently correlate collected data in the context of App Container Host relation Abstraction of monitoring methods and data in order to enable integration with any monitoring tool of choice.Ability to do proactive, reactive and adaptive monitoring.

Monitoring at different levels

Host

Container

Application

Cluster

What to Monitor? Following are the major set of parameters which can be monitored CPUtotal_usageper_cpu_usagesystem_usagehost_usageload_average etc.,Memorymem_pgfaultmem_usagemem_cachemem_kernel etc.,

What to Monitor Disktotal_bytesbytes_readbytes_writtenbytes_asyncbytes_sync etc.,Network rxbytesrxpacketsrxdroppedrxerrorstxbytestxerrors etc.,

Intelligently correlate the collected data that is monitored at different levels mentioned earlier.Enable queries and filters to make meaningful inferences from the raw data

How to Monitor?Monitoring Strategy

Proactive : Prevent failure situations

Reactive : Raise events and alerts when failures occur.

Adaptive : Automatically monitor new components and model statistics

What to use when? How?Different levels need different type of monitoring strategy

Design ObjectivesNot overloading the Docker Daemon.Different approaches of monitoring at different levels.Modular & Driver based approach for all possible componentsRunning multiple agent drivers simultaneously.Added considerations for Linked/Clustered Containers

High Level Component DesignDataStorageIQAgent

EngineAPI (REST)

CLIUIRest ClientQueue

Agent

AgentHostHostHostCCCCCCCCCMonitoring Controller

FunctionsHostContainerAppsModel&ProcessDataStoreCollect Data /LogsAnalyzePresent ResultPredictions/Suggestion

Agent

ContainerAppsHost

AgentDriverDriverDriver

QueueDump to QueueLogs & StatsLogs & StatsLogs & StatsTo Engine

Agent One Agent per hostAgent monitors the host, containers on that host, applications on these containersAgent send & receive to the engine in a async model using queues.Driver based log/stats collection can be done for host/application/containers.Drivers based on tool of choice of user for stats/log collection can be used for each/multiple for hosts/applications/containers.More than one driver can run in parallel to collect even more diverse params.Takes care of sanity of data collected to conform to the data-model in the engine.

Monitoring controllerLogical grouping of componentsREST API to be connected via CLI, UI or any other REST-clientDriver based storage module that uses any columnar databaseIQ module that provide intelligent predictionsEngineAggregate stats & logs from different Docker Hosts.Integration with Identity providers (like keystone) for supporting multitenant deploymentsCommunication from agents via asynchronous queues.Grouping & Processing of data based on use-cases.

IQ ModuleLog & stats collected and stored make up a lot of unstructured data.Meaningful Inferences out of this data would be of better value to the user.Analytic tools like pandas, scipy planned be used to derive inteferences.Error predictions, usage/load pattern, capacity planning can be direct output.Suggestions regarding infra would be output for this module.

DEMO

Thank You.