jacopo nardiello - monitoring cloud-native applications with prometheus - codemotion milan 2017

38
Monitoring Cloud-Native applications with Prometheus Jacopo Nardiello CODEMOTION MILAN - SPECIAL EDITION 10 – 11 NOVEMBER 2017

Upload: codemotion

Post on 21-Jan-2018

55 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Codemotion Milan 2017

Monitoring Cloud-Native applications with Prometheus

Jacopo Nardiello

CODEMOTION MILAN - SPECIAL EDITION 10 – 11 NOVEMBER 2017

Page 2: Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Codemotion Milan 2017

Jacopo NardielloSIGHUP Founder & DevOps Engineer@jnardiello

~ whoami

Page 3: Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Codemotion Milan 2017

~ ./stuff_I_poke_around_with

- Linux- Kubernetes (clusters lifecycles and workloads scheduling in general)- The CloudTM (VMs and Containers + other people's computers)- golang- More devops toys FTW! (CI/CDs, Ansible, etc..)

Page 4: Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Codemotion Milan 2017

What is exactly “Cloud-Native”?

Page 5: Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Codemotion Milan 2017

Cloud-Native is NOT The CloudTM

At its root, Cloud Native is structuring teams, culture and technology to utilize automation and architectures to manage

complexity and unlock velocity.

Joe Beda

Page 6: Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Codemotion Milan 2017

There’s a copernican revolution happening on infrastructures

A fundamental shift:From VM-based Mutable to Highly Dynamic and Immutableinfrastructures

Page 7: Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Codemotion Milan 2017

The path to Cloud-Native Architectures

Page 8: Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Codemotion Milan 2017

Why Containers- A new infrastructural unit- Atomic deployments- Very small footprint, superfast scaling

Page 9: Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Codemotion Milan 2017

Why Orchestrators

- Sandboxed environment- Computers take over the scheduling- Automatic Healthchecks and self-healing

Page 10: Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Codemotion Milan 2017

Cloud-Native is challenging

Page 11: Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Codemotion Milan 2017

PrometheusCloud-Native monitoring with

Page 12: Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Codemotion Milan 2017

Overview: What is Prometheus?

Community Driven Open-source Monitoring and Alerting framework.

- Time series database for instrumentation, metrics collection, storage and querying

- Alerting entity- Integrated tools for metrics exposure

Page 13: Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Codemotion Milan 2017

Overview: A bit of context around Prometheus

Started in 2012 as a SoundCloud internal project

Second project to join CNCF after Kubernetes

Page 14: Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Codemotion Milan 2017

Overview: Focus

Operational systems monitoring

Dynamic cloud environments

Page 15: Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Codemotion Milan 2017

Core features

● Powerful no-sql query language, PromQL● Time series data model● Optimized to be efficient● Operational & Architectural simplicity

Page 16: Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Codemotion Milan 2017

Pull /metrics endpoints

Monitoring model: Pull

Page 17: Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Codemotion Milan 2017

Prometheus Architecture

Page 18: Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Codemotion Milan 2017

The Architecture behind Prometheus

Page 19: Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Codemotion Milan 2017

Prometheus core

- Service discovery and targets definition

- Metrics scraping- Time series database- Alerts and Recording rules- Alerting evaluation- Metrics query

Page 20: Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Codemotion Milan 2017

Alertmanager

- Alerting & silencing- Dispatching notification to

different channels

Page 21: Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Codemotion Milan 2017

Exporters & SDKs

Formatting metrics to be exported in the expected prometheus format

- Either exporters (Node, Rabbit, Mysql, etc..)

- SDKs to export application metrics

Page 22: Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Codemotion Milan 2017

Prometheus Basics

Page 23: Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Codemotion Milan 2017

Prom Server configuration

- CLI flags for the immutable daemon

- Config file defines scraping targets, instances and jobs

Page 24: Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Codemotion Milan 2017

Prom Server configuration

- CLI flags for the immutable daemon

- Config file defines scraping targets, instances and jobs

global: scrape_interval: 1m scrape_timeout: 30s

external_labels: cluster: "test-cluster"

rule_files: - rules/rules.yml

# Scraping targetsscrape_configs: - job_name: 'some-service' static_config: - <host> or <dns> labels: app: "some-service"

prometheus.yml

Page 25: Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Codemotion Milan 2017

/metrics# HELP hash_seconds Time taken to create hashes # TYPE hash_seconds histogram hash_seconds_bucket{code="200",le="1"} 2 hash_seconds_bucket{code="200",le="2.5"} 2 hash_seconds_bucket{code="200",le="5"} 2 hash_seconds_bucket{code="200",le="10"} 2 hash_seconds_bucket{code="200",le="+Inf"} 2 hash_seconds_sum{code="200"} 9.370800000000002e-05 hash_seconds_count{code="200"} 2

Page 26: Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Codemotion Milan 2017

Data model & querying

api_http_requests_total{method="POST", handler="/messages"}

- Labels based data model - Each label and combination of labels is a dimension where we

can filter and aggregate exported data - Changing, adding or removing a label will create a new time

series

Page 27: Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Codemotion Milan 2017

PromQL & Label based queries

http_requests_total all time series related to the metric http_requests_total

http_requests_total{code="200",method="get"} time series related to successful request with method get for the metric http_requests_total

http_requests_total{code="200",method="get"}[5m] returns a range vector

Page 28: Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Codemotion Milan 2017

PromQL & Label based queries

http_requests_total{status!~"^4..$"}Selecting all errors-related time series using regexes

sum(rate(http_requests_total[5m])) by (job) Applying functions, in this case we sum over a range vector and aggregating by job

Page 29: Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Codemotion Milan 2017

Prometheus web interface

Page 30: Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Codemotion Milan 2017

Visualization

Plotting and graphing are out of prometheus scope.

Use Grafana

Page 31: Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Codemotion Milan 2017

AlertingRules- Evaluated by the prometheus

server on a regular basis- If a certain query matches a

condition, the alert is triggered

ALERT InstanceDown IF up == 0 FOR 5m LABELS { severity = "critical" } ANNOTATIONS { summary = "Instance {{ $labels.instance }} down", description = "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes.", }

Until Prometheus 1.8

This syntax has been changed to standard yaml starting from Prometheus v2 (structure stays the same)

Page 32: Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Codemotion Milan 2017

Alert DispatchingJob of the alertmanager is to dispatch alerts to the right channel according to

their severity

Page 33: Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Codemotion Milan 2017

Cloud-Native monitoring

Page 34: Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Codemotion Milan 2017

Service discovery

Scraping statically defined targets is not very useful

kubernetes_sd_configNative integration for kubernetes environments

- Prometheus is aware of running in a kubernetes cluster- Automatically retrieve scraping targets such as nodes, pods, containers from the

k8s API

Page 35: Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Codemotion Milan 2017

More integrations (many more…)- ec2_sd_config- azure_sd_config- openstack_sd_config- gce_sd_config- kubernetes_sd_config- consul_sd_config- dns_sd_config- file_sd_config- marathon_sd_config- nerve_sd_config- triton_sd_config- static_config

Page 36: Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Codemotion Milan 2017

Re-labeling

- Relabeling is a very powerful mechanism that allow us to further manipulate labels from the targets. - It’s a very effective way to turn targets from an API and apply sophisticated targeting strategies (i.e.

manipulating addresses or ports, filtering a subset of targets, etc..)

A quick configuration example:

- job_name: 'kubernetes-pods' kubernetes_sd_configs: - role: pod relabel_configs: - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape] action: keep regex: true

Page 37: Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Codemotion Milan 2017

Demo Time!

Page 38: Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Codemotion Milan 2017

Thank you,Questions?

We are [email protected]

@jnardiello