observing enterprise kubernetes clusters at...
TRANSCRIPT
![Page 1: Observing Enterprise Kubernetes Clusters At Scalecontinuouslifecycle.london/wp-content/uploads/2019/01/Joe-Salisbur… · All Kubernetes clusters completely managed 4 - ~35 people](https://reader035.vdocuments.net/reader035/viewer/2022070806/5f0466837e708231d40dc948/html5/thumbnails/1.jpg)
Observing Enterprise Kubernetes Clusters At Scale
Joe Salisbury@salisbury_joe
![Page 2: Observing Enterprise Kubernetes Clusters At Scalecontinuouslifecycle.london/wp-content/uploads/2019/01/Joe-Salisbur… · All Kubernetes clusters completely managed 4 - ~35 people](https://reader035.vdocuments.net/reader035/viewer/2022070806/5f0466837e708231d40dc948/html5/thumbnails/2.jpg)
Product Owner - Internal Platform Team
How do we empower Product teams?
2
![Page 3: Observing Enterprise Kubernetes Clusters At Scalecontinuouslifecycle.london/wp-content/uploads/2019/01/Joe-Salisbur… · All Kubernetes clusters completely managed 4 - ~35 people](https://reader035.vdocuments.net/reader035/viewer/2022070806/5f0466837e708231d40dc948/html5/thumbnails/3.jpg)
Giant Swarm manages Kubernetes clusters for enterprises
3
![Page 4: Observing Enterprise Kubernetes Clusters At Scalecontinuouslifecycle.london/wp-content/uploads/2019/01/Joe-Salisbur… · All Kubernetes clusters completely managed 4 - ~35 people](https://reader035.vdocuments.net/reader035/viewer/2022070806/5f0466837e708231d40dc948/html5/thumbnails/4.jpg)
Control plane for managing Kubernetes clusters
All Kubernetes clusters completely managed
4
![Page 5: Observing Enterprise Kubernetes Clusters At Scalecontinuouslifecycle.london/wp-content/uploads/2019/01/Joe-Salisbur… · All Kubernetes clusters completely managed 4 - ~35 people](https://reader035.vdocuments.net/reader035/viewer/2022070806/5f0466837e708231d40dc948/html5/thumbnails/5.jpg)
- ~35 people- 100s of Clusters- 1000s of Nodes- EU, USA, China
Scale
5
![Page 6: Observing Enterprise Kubernetes Clusters At Scalecontinuouslifecycle.london/wp-content/uploads/2019/01/Joe-Salisbur… · All Kubernetes clusters completely managed 4 - ~35 people](https://reader035.vdocuments.net/reader035/viewer/2022070806/5f0466837e708231d40dc948/html5/thumbnails/6.jpg)
- AWS- Azure- On-Prem
Providers
6
![Page 7: Observing Enterprise Kubernetes Clusters At Scalecontinuouslifecycle.london/wp-content/uploads/2019/01/Joe-Salisbur… · All Kubernetes clusters completely managed 4 - ~35 people](https://reader035.vdocuments.net/reader035/viewer/2022070806/5f0466837e708231d40dc948/html5/thumbnails/7.jpg)
Giant Swarm takes care of your infrastructure
You focus on your business value
7
![Page 8: Observing Enterprise Kubernetes Clusters At Scalecontinuouslifecycle.london/wp-content/uploads/2019/01/Joe-Salisbur… · All Kubernetes clusters completely managed 4 - ~35 people](https://reader035.vdocuments.net/reader035/viewer/2022070806/5f0466837e708231d40dc948/html5/thumbnails/8.jpg)
Fully managed==
Responsible for everything
8
![Page 9: Observing Enterprise Kubernetes Clusters At Scalecontinuouslifecycle.london/wp-content/uploads/2019/01/Joe-Salisbur… · All Kubernetes clusters completely managed 4 - ~35 people](https://reader035.vdocuments.net/reader035/viewer/2022070806/5f0466837e708231d40dc948/html5/thumbnails/9.jpg)
- Managed Apps- Kubernetes- Actual Infrastructure
9
What is Everything?
![Page 10: Observing Enterprise Kubernetes Clusters At Scalecontinuouslifecycle.london/wp-content/uploads/2019/01/Joe-Salisbur… · All Kubernetes clusters completely managed 4 - ~35 people](https://reader035.vdocuments.net/reader035/viewer/2022070806/5f0466837e708231d40dc948/html5/thumbnails/10.jpg)
Responsible for everything==
Monitoring for everything
10
![Page 11: Observing Enterprise Kubernetes Clusters At Scalecontinuouslifecycle.london/wp-content/uploads/2019/01/Joe-Salisbur… · All Kubernetes clusters completely managed 4 - ~35 people](https://reader035.vdocuments.net/reader035/viewer/2022070806/5f0466837e708231d40dc948/html5/thumbnails/11.jpg)
Observing Kubernetes
11
![Page 12: Observing Enterprise Kubernetes Clusters At Scalecontinuouslifecycle.london/wp-content/uploads/2019/01/Joe-Salisbur… · All Kubernetes clusters completely managed 4 - ~35 people](https://reader035.vdocuments.net/reader035/viewer/2022070806/5f0466837e708231d40dc948/html5/thumbnails/12.jpg)
- Metrics- Logging- Tracing
Monitoring Domains
12
![Page 13: Observing Enterprise Kubernetes Clusters At Scalecontinuouslifecycle.london/wp-content/uploads/2019/01/Joe-Salisbur… · All Kubernetes clusters completely managed 4 - ~35 people](https://reader035.vdocuments.net/reader035/viewer/2022070806/5f0466837e708231d40dc948/html5/thumbnails/13.jpg)
- EFK stack- Mainly used for deep debugging after the fact- Looking at Loki for the future
- Lighter, Prometheus / Grafana integration
Logging
13
![Page 14: Observing Enterprise Kubernetes Clusters At Scalecontinuouslifecycle.london/wp-content/uploads/2019/01/Joe-Salisbur… · All Kubernetes clusters completely managed 4 - ~35 people](https://reader035.vdocuments.net/reader035/viewer/2022070806/5f0466837e708231d40dc948/html5/thumbnails/14.jpg)
- Looking at Jaeger- Helpful for our API services (request-response)
- Tip of the iceberg- Most likely will kill these in future
- Still researching tracing for operators- Async background processing- Lots of small traces
Tracing
14
![Page 15: Observing Enterprise Kubernetes Clusters At Scalecontinuouslifecycle.london/wp-content/uploads/2019/01/Joe-Salisbur… · All Kubernetes clusters completely managed 4 - ~35 people](https://reader035.vdocuments.net/reader035/viewer/2022070806/5f0466837e708231d40dc948/html5/thumbnails/15.jpg)
Metrics -> Prometheus
15
![Page 16: Observing Enterprise Kubernetes Clusters At Scalecontinuouslifecycle.london/wp-content/uploads/2019/01/Joe-Salisbur… · All Kubernetes clusters completely managed 4 - ~35 people](https://reader035.vdocuments.net/reader035/viewer/2022070806/5f0466837e708231d40dc948/html5/thumbnails/16.jpg)
- Present- Pains- Plans
Our Prometheus Journey
16
![Page 17: Observing Enterprise Kubernetes Clusters At Scalecontinuouslifecycle.london/wp-content/uploads/2019/01/Joe-Salisbur… · All Kubernetes clusters completely managed 4 - ~35 people](https://reader035.vdocuments.net/reader035/viewer/2022070806/5f0466837e708231d40dc948/html5/thumbnails/17.jpg)
Monitoring is an evolutionary processPresent
17
![Page 18: Observing Enterprise Kubernetes Clusters At Scalecontinuouslifecycle.london/wp-content/uploads/2019/01/Joe-Salisbur… · All Kubernetes clusters completely managed 4 - ~35 people](https://reader035.vdocuments.net/reader035/viewer/2022070806/5f0466837e708231d40dc948/html5/thumbnails/18.jpg)
18
Tenant ClustersControl Plane
API API Server, Kubelets, etc.
API Server, Kubelets, etc.
API Server, Kubelets, etc.
Operators
Monitoring
![Page 19: Observing Enterprise Kubernetes Clusters At Scalecontinuouslifecycle.london/wp-content/uploads/2019/01/Joe-Salisbur… · All Kubernetes clusters completely managed 4 - ~35 people](https://reader035.vdocuments.net/reader035/viewer/2022070806/5f0466837e708231d40dc948/html5/thumbnails/19.jpg)
- We have a Prometheus server running on the control plane - we can use it to monitor all the tenant clusters!
- This was maybe a good idea at the time
‘We need to monitor clusters’
19
![Page 20: Observing Enterprise Kubernetes Clusters At Scalecontinuouslifecycle.london/wp-content/uploads/2019/01/Joe-Salisbur… · All Kubernetes clusters completely managed 4 - ~35 people](https://reader035.vdocuments.net/reader035/viewer/2022070806/5f0466837e708231d40dc948/html5/thumbnails/20.jpg)
- Dependencies
- Tenant clusters routable from the control plane- Peering / IPAM
20
![Page 21: Observing Enterprise Kubernetes Clusters At Scalecontinuouslifecycle.london/wp-content/uploads/2019/01/Joe-Salisbur… · All Kubernetes clusters completely managed 4 - ~35 people](https://reader035.vdocuments.net/reader035/viewer/2022070806/5f0466837e708231d40dc948/html5/thumbnails/21.jpg)
21
Control Plane VPC
10.0.0.0/16
control plane:tenant clusters:/24 mask
10.1.0.0/24 10.1.1.0/24 10.1.2.0/24
10.0.0.0/16 (10.0.0.0 -> 10.0.255.255)
10.1.0.0/16 (10.1.0.0 -> 10.1.255.255)
Tenant Cluster VPC Tenant Cluster VPC Tenant Cluster VPC
![Page 22: Observing Enterprise Kubernetes Clusters At Scalecontinuouslifecycle.london/wp-content/uploads/2019/01/Joe-Salisbur… · All Kubernetes clusters completely managed 4 - ~35 people](https://reader035.vdocuments.net/reader035/viewer/2022070806/5f0466837e708231d40dc948/html5/thumbnails/22.jpg)
- Configuration- Automatically adding tenant clusters to
Prometheus
22
![Page 23: Observing Enterprise Kubernetes Clusters At Scalecontinuouslifecycle.london/wp-content/uploads/2019/01/Joe-Salisbur… · All Kubernetes clusters completely managed 4 - ~35 people](https://reader035.vdocuments.net/reader035/viewer/2022070806/5f0466837e708231d40dc948/html5/thumbnails/23.jpg)
- Sidecar for Prometheus
- Watches for Kubernetes Custom Resources- Updates Prometheus ConfigMap- Fetches certificates, shares via emptyDir- Reloads Prometheus on changes
prometheus-config-controller
23
![Page 24: Observing Enterprise Kubernetes Clusters At Scalecontinuouslifecycle.london/wp-content/uploads/2019/01/Joe-Salisbur… · All Kubernetes clusters completely managed 4 - ~35 people](https://reader035.vdocuments.net/reader035/viewer/2022070806/5f0466837e708231d40dc948/html5/thumbnails/24.jpg)
24
prometheus-config-controllerChartconfig CR
Chartconfig CRClusters prometheus
Prometheus ConfigMap
Chartconfig CR
Chartconfig CRCertificates Certificate
Volume
watches
reads
syncs reads
reloads
![Page 25: Observing Enterprise Kubernetes Clusters At Scalecontinuouslifecycle.london/wp-content/uploads/2019/01/Joe-Salisbur… · All Kubernetes clusters completely managed 4 - ~35 people](https://reader035.vdocuments.net/reader035/viewer/2022070806/5f0466837e708231d40dc948/html5/thumbnails/25.jpg)
25
Tenant ClustersControl Plane
Prometheus API Server, Kubelets, etc.
API Server, Kubelets, etc.
API Server, Kubelets, etc.
![Page 26: Observing Enterprise Kubernetes Clusters At Scalecontinuouslifecycle.london/wp-content/uploads/2019/01/Joe-Salisbur… · All Kubernetes clusters completely managed 4 - ~35 people](https://reader035.vdocuments.net/reader035/viewer/2022070806/5f0466837e708231d40dc948/html5/thumbnails/26.jpg)
26
also add node-exporter, ingress-controllers, coredns, custom exporters, all the control plane services, the kitchen sink...
![Page 27: Observing Enterprise Kubernetes Clusters At Scalecontinuouslifecycle.london/wp-content/uploads/2019/01/Joe-Salisbur… · All Kubernetes clusters completely managed 4 - ~35 people](https://reader035.vdocuments.net/reader035/viewer/2022070806/5f0466837e708231d40dc948/html5/thumbnails/27.jpg)
- AlertManager & OpsGenie
- Heartbeats for each installation- Always firing alert in Prometheus- Special routing to OpsGenie in AlertManager- Heartbeat support in OpsGenie (page if no
ping)
27
![Page 28: Observing Enterprise Kubernetes Clusters At Scalecontinuouslifecycle.london/wp-content/uploads/2019/01/Joe-Salisbur… · All Kubernetes clusters completely managed 4 - ~35 people](https://reader035.vdocuments.net/reader035/viewer/2022070806/5f0466837e708231d40dc948/html5/thumbnails/28.jpg)
28
prometheus alertmanager
Installation 1
alertmanager
Installation 2
prometheus alertmanager
Installation 3
Installation 2 is down, ding ding ding
![Page 29: Observing Enterprise Kubernetes Clusters At Scalecontinuouslifecycle.london/wp-content/uploads/2019/01/Joe-Salisbur… · All Kubernetes clusters completely managed 4 - ~35 people](https://reader035.vdocuments.net/reader035/viewer/2022070806/5f0466837e708231d40dc948/html5/thumbnails/29.jpg)
- In production for most of 2018, and a fair chunk of 2019 now
- Added more targets, some improvements, but no major architectural changes
And it works!
29
![Page 30: Observing Enterprise Kubernetes Clusters At Scalecontinuouslifecycle.london/wp-content/uploads/2019/01/Joe-Salisbur… · All Kubernetes clusters completely managed 4 - ~35 people](https://reader035.vdocuments.net/reader035/viewer/2022070806/5f0466837e708231d40dc948/html5/thumbnails/30.jpg)
Roll for InitiativePains
30
![Page 31: Observing Enterprise Kubernetes Clusters At Scalecontinuouslifecycle.london/wp-content/uploads/2019/01/Joe-Salisbur… · All Kubernetes clusters completely managed 4 - ~35 people](https://reader035.vdocuments.net/reader035/viewer/2022070806/5f0466837e708231d40dc948/html5/thumbnails/31.jpg)
- Number of clusters correlates (ish) with number of series
- Number of series correlates with memory usage
Prometheus Memory Usage
31
![Page 32: Observing Enterprise Kubernetes Clusters At Scalecontinuouslifecycle.london/wp-content/uploads/2019/01/Joe-Salisbur… · All Kubernetes clusters completely managed 4 - ~35 people](https://reader035.vdocuments.net/reader035/viewer/2022070806/5f0466837e708231d40dc948/html5/thumbnails/32.jpg)
- Currently forced to scale vertically - Fine for now, but not where we want to be in
the future- We want to enable developers to add tons of
metrics- Trend will only continue
32
![Page 33: Observing Enterprise Kubernetes Clusters At Scalecontinuouslifecycle.london/wp-content/uploads/2019/01/Joe-Salisbur… · All Kubernetes clusters completely managed 4 - ~35 people](https://reader035.vdocuments.net/reader035/viewer/2022070806/5f0466837e708231d40dc948/html5/thumbnails/33.jpg)
Prometheus v2.9.1 (from v2.6.0)
33
- Go 1.12!
![Page 34: Observing Enterprise Kubernetes Clusters At Scalecontinuouslifecycle.london/wp-content/uploads/2019/01/Joe-Salisbur… · All Kubernetes clusters completely managed 4 - ~35 people](https://reader035.vdocuments.net/reader035/viewer/2022070806/5f0466837e708231d40dc948/html5/thumbnails/34.jpg)
- Outgrown / outgrowing our initial assumption that customers would run a handful of small tenant clusters
- We can drop metrics we don’t need (e.g: cadvisor for customer workloads) as needed
- But, not a long term solution
34
![Page 35: Observing Enterprise Kubernetes Clusters At Scalecontinuouslifecycle.london/wp-content/uploads/2019/01/Joe-Salisbur… · All Kubernetes clusters completely managed 4 - ~35 people](https://reader035.vdocuments.net/reader035/viewer/2022070806/5f0466837e708231d40dc948/html5/thumbnails/35.jpg)
- If the Prometheus server goes down, we lose monitoring for all tenant clusters- We can have a better failure mode- e.g: lose monitoring for some percentage of
tenant clusters
Reliability
35
![Page 36: Observing Enterprise Kubernetes Clusters At Scalecontinuouslifecycle.london/wp-content/uploads/2019/01/Joe-Salisbur… · All Kubernetes clusters completely managed 4 - ~35 people](https://reader035.vdocuments.net/reader035/viewer/2022070806/5f0466837e708231d40dc948/html5/thumbnails/36.jpg)
- Having separate installations is great most of the time
- Pain in the ass for querying- Digging into a global view
- Have to look at multiple Grafanas- Percentage of data we see will decrease over time
(human patience is a constant)
Querying
36
![Page 37: Observing Enterprise Kubernetes Clusters At Scalecontinuouslifecycle.london/wp-content/uploads/2019/01/Joe-Salisbur… · All Kubernetes clusters completely managed 4 - ~35 people](https://reader035.vdocuments.net/reader035/viewer/2022070806/5f0466837e708231d40dc948/html5/thumbnails/37.jpg)
A collection of ideas for the futurePlans
37
![Page 38: Observing Enterprise Kubernetes Clusters At Scalecontinuouslifecycle.london/wp-content/uploads/2019/01/Joe-Salisbur… · All Kubernetes clusters completely managed 4 - ~35 people](https://reader035.vdocuments.net/reader035/viewer/2022070806/5f0466837e708231d40dc948/html5/thumbnails/38.jpg)
Goal for 2019 is to improve the scalability of our metrics infrastructure
38
![Page 39: Observing Enterprise Kubernetes Clusters At Scalecontinuouslifecycle.london/wp-content/uploads/2019/01/Joe-Salisbur… · All Kubernetes clusters completely managed 4 - ~35 people](https://reader035.vdocuments.net/reader035/viewer/2022070806/5f0466837e708231d40dc948/html5/thumbnails/39.jpg)
- If we can’t scale vertically, let’s scale horizontally!
- One Prometheus per tenant cluster (at least)
Addressing Prometheus Scaling
39
![Page 40: Observing Enterprise Kubernetes Clusters At Scalecontinuouslifecycle.london/wp-content/uploads/2019/01/Joe-Salisbur… · All Kubernetes clusters completely managed 4 - ~35 people](https://reader035.vdocuments.net/reader035/viewer/2022070806/5f0466837e708231d40dc948/html5/thumbnails/40.jpg)
- prometheus-operator- Use building blocks!
- Build a new operator that watches our Cluster CRs, ensures CRs for prometheus-operator
40
![Page 41: Observing Enterprise Kubernetes Clusters At Scalecontinuouslifecycle.london/wp-content/uploads/2019/01/Joe-Salisbur… · All Kubernetes clusters completely managed 4 - ~35 people](https://reader035.vdocuments.net/reader035/viewer/2022070806/5f0466837e708231d40dc948/html5/thumbnails/41.jpg)
41
prometheus-config-operatorChartconfig CR
Chartconfig CRCluster CR Chartconfig
CRChartconfig CRPrometheus CR prometheus-operator
Prometheus PrometheusPrometheus
watches watchesensures
ensures
![Page 42: Observing Enterprise Kubernetes Clusters At Scalecontinuouslifecycle.london/wp-content/uploads/2019/01/Joe-Salisbur… · All Kubernetes clusters completely managed 4 - ~35 people](https://reader035.vdocuments.net/reader035/viewer/2022070806/5f0466837e708231d40dc948/html5/thumbnails/42.jpg)
42
Tenant ClustersControl Plane
Prometheus API Server, Kubelets, etc.
API Server, Kubelets, etc.
API Server, Kubelets, etc.
Prometheus
Prometheus
Prometheus
![Page 43: Observing Enterprise Kubernetes Clusters At Scalecontinuouslifecycle.london/wp-content/uploads/2019/01/Joe-Salisbur… · All Kubernetes clusters completely managed 4 - ~35 people](https://reader035.vdocuments.net/reader035/viewer/2022070806/5f0466837e708231d40dc948/html5/thumbnails/43.jpg)
Codify our Prometheus topology in one service
43
![Page 44: Observing Enterprise Kubernetes Clusters At Scalecontinuouslifecycle.london/wp-content/uploads/2019/01/Joe-Salisbur… · All Kubernetes clusters completely managed 4 - ~35 people](https://reader035.vdocuments.net/reader035/viewer/2022070806/5f0466837e708231d40dc948/html5/thumbnails/44.jpg)
- Provide one feature with one service- Provide / use building blocks / abstraction layers- Codify business logic in one operator
44
![Page 45: Observing Enterprise Kubernetes Clusters At Scalecontinuouslifecycle.london/wp-content/uploads/2019/01/Joe-Salisbur… · All Kubernetes clusters completely managed 4 - ~35 people](https://reader035.vdocuments.net/reader035/viewer/2022070806/5f0466837e708231d40dc948/html5/thumbnails/45.jpg)
- We may need to support multiple Prometheus
servers per Kubernetes cluster (for gargantuan clusters)- We can transition into it- e.g: prometheus-config-operator can create
multiple Prometheus CRs for one tenant cluster
- Benefit of having topology codified in one operator
45
![Page 46: Observing Enterprise Kubernetes Clusters At Scalecontinuouslifecycle.london/wp-content/uploads/2019/01/Joe-Salisbur… · All Kubernetes clusters completely managed 4 - ~35 people](https://reader035.vdocuments.net/reader035/viewer/2022070806/5f0466837e708231d40dc948/html5/thumbnails/46.jpg)
- Sharding Prometheus allows us to scale horizontally
- Increases scalability and reliability- Can scale control plane horizontally- Failure modes are better
46
![Page 47: Observing Enterprise Kubernetes Clusters At Scalecontinuouslifecycle.london/wp-content/uploads/2019/01/Joe-Salisbur… · All Kubernetes clusters completely managed 4 - ~35 people](https://reader035.vdocuments.net/reader035/viewer/2022070806/5f0466837e708231d40dc948/html5/thumbnails/47.jpg)
- Still early days- Let’s try Cortex!- All Prometheus servers use remote write to write
to a Cortex backend- Use Cortex for global querying (one Grafana to
rule them all)
- Keep alerting at installation level
Global Observability
47
![Page 48: Observing Enterprise Kubernetes Clusters At Scalecontinuouslifecycle.london/wp-content/uploads/2019/01/Joe-Salisbur… · All Kubernetes clusters completely managed 4 - ~35 people](https://reader035.vdocuments.net/reader035/viewer/2022070806/5f0466837e708231d40dc948/html5/thumbnails/48.jpg)
Empowerment
48
![Page 49: Observing Enterprise Kubernetes Clusters At Scalecontinuouslifecycle.london/wp-content/uploads/2019/01/Joe-Salisbur… · All Kubernetes clusters completely managed 4 - ~35 people](https://reader035.vdocuments.net/reader035/viewer/2022070806/5f0466837e708231d40dc948/html5/thumbnails/49.jpg)
What does this help us do in the future?
49
![Page 50: Observing Enterprise Kubernetes Clusters At Scalecontinuouslifecycle.london/wp-content/uploads/2019/01/Joe-Salisbur… · All Kubernetes clusters completely managed 4 - ~35 people](https://reader035.vdocuments.net/reader035/viewer/2022070806/5f0466837e708231d40dc948/html5/thumbnails/50.jpg)
Giant Swarm builds and operates one product
No custom infrastructure
50
![Page 51: Observing Enterprise Kubernetes Clusters At Scalecontinuouslifecycle.london/wp-content/uploads/2019/01/Joe-Salisbur… · All Kubernetes clusters completely managed 4 - ~35 people](https://reader035.vdocuments.net/reader035/viewer/2022070806/5f0466837e708231d40dc948/html5/thumbnails/51.jpg)
Feedback loop
- Monitoring to detect- Postmortems to fix- Pipeline to deploy
Detect, Fix, Deploy
51
![Page 52: Observing Enterprise Kubernetes Clusters At Scalecontinuouslifecycle.london/wp-content/uploads/2019/01/Joe-Salisbur… · All Kubernetes clusters completely managed 4 - ~35 people](https://reader035.vdocuments.net/reader035/viewer/2022070806/5f0466837e708231d40dc948/html5/thumbnails/52.jpg)
Learnings from one installation rolled out to all customers
52
![Page 53: Observing Enterprise Kubernetes Clusters At Scalecontinuouslifecycle.london/wp-content/uploads/2019/01/Joe-Salisbur… · All Kubernetes clusters completely managed 4 - ~35 people](https://reader035.vdocuments.net/reader035/viewer/2022070806/5f0466837e708231d40dc948/html5/thumbnails/53.jpg)
- Monitoring enables this feedback loop- Improving monitoring improves this feedback loop
- Kind of the point of an internal platform team :D
53
![Page 54: Observing Enterprise Kubernetes Clusters At Scalecontinuouslifecycle.london/wp-content/uploads/2019/01/Joe-Salisbur… · All Kubernetes clusters completely managed 4 - ~35 people](https://reader035.vdocuments.net/reader035/viewer/2022070806/5f0466837e708231d40dc948/html5/thumbnails/54.jpg)
Good observability is not just reactive
Aim to work proactively
54
![Page 55: Observing Enterprise Kubernetes Clusters At Scalecontinuouslifecycle.london/wp-content/uploads/2019/01/Joe-Salisbur… · All Kubernetes clusters completely managed 4 - ~35 people](https://reader035.vdocuments.net/reader035/viewer/2022070806/5f0466837e708231d40dc948/html5/thumbnails/55.jpg)
What questions do you have?
Tobias is doing a workshop tomorrow!
Bam!
55
![Page 56: Observing Enterprise Kubernetes Clusters At Scalecontinuouslifecycle.london/wp-content/uploads/2019/01/Joe-Salisbur… · All Kubernetes clusters completely managed 4 - ~35 people](https://reader035.vdocuments.net/reader035/viewer/2022070806/5f0466837e708231d40dc948/html5/thumbnails/56.jpg)
Thank you!
Joe Salisbury@salisbury_joe
![Page 57: Observing Enterprise Kubernetes Clusters At Scalecontinuouslifecycle.london/wp-content/uploads/2019/01/Joe-Salisbur… · All Kubernetes clusters completely managed 4 - ~35 people](https://reader035.vdocuments.net/reader035/viewer/2022070806/5f0466837e708231d40dc948/html5/thumbnails/57.jpg)
- e.g: Adidas reports issue with 95th percentile DNS latency- Add alerting for high 95th percentile DNS
latency- Improve DNS dashboard to better show
distribution- Update default CoreDNS configuration for
mitigate (autopath)- Fix lib-musl issue (don’t use the library)
57
![Page 58: Observing Enterprise Kubernetes Clusters At Scalecontinuouslifecycle.london/wp-content/uploads/2019/01/Joe-Salisbur… · All Kubernetes clusters completely managed 4 - ~35 people](https://reader035.vdocuments.net/reader035/viewer/2022070806/5f0466837e708231d40dc948/html5/thumbnails/58.jpg)
58
![Page 59: Observing Enterprise Kubernetes Clusters At Scalecontinuouslifecycle.london/wp-content/uploads/2019/01/Joe-Salisbur… · All Kubernetes clusters completely managed 4 - ~35 people](https://reader035.vdocuments.net/reader035/viewer/2022070806/5f0466837e708231d40dc948/html5/thumbnails/59.jpg)
59