more containers, more problemsevents.static.linuxfound.org/sites/events/files/slides/more containers...
TRANSCRIPT
Ed Rooth@sym3tri | [email protected] | coreos.com
More Containers, More Problems
1. Define problems2. Define vision of the solution3. How CoreOS is building solutions4. How you can get started
Agenda
a server
It all started with...
many servers
Then we got...
VMs on our servers
Then we got...
APIs around hosted VMs (cloud)
Then we got...
even more servers
Which led to...
The cloud made booting servers really easy.
Also… Moore’s law is still a thing.
Too Many Servers!
Patching………………………..is hardDependency management........is hardManaging access ……………...is hardManaging workloads ………....is hardApp Lifecycle management .. ..is hardIdentifying security issues ......is hard
More Servers, More Problems
More Servers == More Sysadmins
Servers
Sysadmins
1000
500
0
1000
500
0
More Servers, More Problems
Servers
Sysadmins
… before the rest of us did.
They solved many of these problems internally,and published some great papers.
Google needed more servers
We started building it
CoreOS, Google, and the community...
are building the open-source version.
#GIFEE
Google’sInfrastructureForEveryoneElse
What is #GIFEE?
"Fundamentally, it's what happens when you ask a software engineer to design an operations function."
--Ben Treynor SlossVice President, Google Engineering
founder of Google SRE
Google’s Infrastructure
Servers are not your pets
Servers are the new CPU Cores
Clusters are the new servers
What is #GIFEE?
Evolution of Servers
Clusters
Server Cluster
Clusters
Process App
Operating System Custom Linux
Distributed Consensus Chubby
Cluster Manager Borg
Monitoring BorgMon
RPC framework Stubby
Auth private
Operating System Custom Linux CoreOS Linux
Distributed Consensus Chubby etcd
Cluster Manager Borg Kubernetes
Monitoring BorgMon Prometheus
RPC framework Stubby gRPC
Auth private Dex
Open Source
“cluster operating system”
Orchestration
State
Scheduler: Gets work to the servers
OS for Clusters
Software manages servers
Software manages workloads
Declare what you want, it will become so
What is #GIFEE?
workerkubelet
workerkubelet
workerkubelet
workerkubelet
workerkubelet
workerkubelet
workerkubelet
API +
scheduler
workerkubelet
API +
scheduler
API +
Scheduler+
worker
works on 1 node too
Primary component of the Cluster OS
Fits our vision
Started by Google with over 10 yrs experience running Borg
Centralized administration & orchestration
No more SSH
Yes, that even means your favorite config mgmt tool
What is #GIFEE?
What is #GIFEE?
$ scp myapp host:/opt$ ssh host systemd-run /opt/myapp Don’t say HOW
What is #GIFEE?
$ kubectl run myapp--image=quay.io/sym3tri/hello--replicas=1
$ kubectl get podsPOD IPmyapp-97wt8 10.2.29.3
say WHAT
What is #GIFEE?
$ kubectl scale rc myapp--replicas=4
$ kubectl get podsPOD IPmyapp-97wt8 10.2.29.3myapp-f839d 10.2.29.4myapp-98b35 10.2.29.5myapp-e40ee 10.2.29.8
say WHATagain
What is #GIFEE?
$ kubectl run myapp--image=quay.io/sym3tri/hello--replicas=1
$ kubectl get podsPOD IPmyapp-97wt8 10.2.29.3
say WHAT one more time
RC web-prod
select(env=prod,app=web)count=1
Pod
env=prodapp=web
RC web-prod
select(env=prod,app=web)count=4
Pod
env=prodapp=web
Pod
env=prodapp=web
Pod
env=prodapp=web
Pod
env=prodapp=web
automated != automatic
Dependencies are isolated per app
Apps automatically migrate throughout the cluster
What is #GIFEE?
All apps are “12-factor”
Configuration/Secret management
What is #GIFEE?
prodconfig
stagingconfig
Consistent Deployment API
Deploy canary builds and experiments
Rolling Updates
What is #GIFEE?
Load BalancedService
appv1
appv1
appv1
appv1
Load BalancedService
appv1
appv1
appv1
appv1
appv2
Load BalancedService
appv1
appv1
appv1
appv1
appv2
Load BalancedService
appv1
appv1
appv1
appv1
appv2
Load BalancedService
appv1
appv1
appv1
appv2
appv2
Load BalancedService
appv1
appv1
appv2
appv2
appv2
Load BalancedService
appv2
appv2
appv2
appv2
C TeamB Team A Team
What is #GIFEE?
Mixed workloads (staging + prod)
Logically partitioned resources
Trusted & Secure from the bottom up*
Only trusted code is executed
What is #GIFEE?
Cluster OS
Container Runtime
OS
Firmware & TPM
Every {human,machine,process} is…authenticated & authorized
All communication is encrypted
What is #GIFEE?
workerkubelet
API +
scheduler
Failure is expected and handled for…
- Services / Apps- Machines- Storage- Clusters- Regions
What is #GIFEE?
Logging
Monitoring / Alerting
What is #GIFEE?
Compatibility with existing tools
Work with other projects (Docker, Calico, Prometheus)
Incorporates lessons learned
#GIFEE vs Google Infra?
Build for scale
Manage your apps, not servers
High Availability
New paradigm of infra/development
Why?
We believe:
As #GIFEE becomes ubiquitous, the Internet becomes more secure overall
#GIFEE and Security
Secure the Internet
CoreOS Mission
Journey to #GIFEE
Leverage prior work + standards
- Raft- Omaha Protocol- OIDC
Getting Started
Start from the bottom
The Operating System
Securing The Internet
Minimal Server OS + Automatic Updates
Requires:- Distributed consensus- Containers- Cluster computing
Securing The Internet
In this new world we containerize all the things…
Containerize
but…
Containerize
“Every solution breeds new problems”
-Arthur Bloch
1つの問題解決 → 別の問題発生
More Containers, More Problems
Problem #1- Secure & controlled
container distribution
More Containers, More Problems
Problem #1- Secure & controlled
container distribution
More Containers, More Problems
Solution
More Containers, More Problems
Problem #2- Docker security model- Docker coupling of
components
More Containers, More Problems
Problem #2- Docker security model- Docker coupling of
components
Solution
More Containers, More Problems
systemd
app
systemd
app
docker run redis
docker engine daemon
Implementation:
Side Note: Spec vs Implementation
Side Note: Spec vs Implementation
Specification:
https://en.wikipedia.org/wiki/ISO_668
More Containers, More Problems
Problem #3- User Authentication
More Containers, More Problems
Problem #3- User Authentication
Solution - Dex
More Containers, More Problems
Problem #4- Really big containers
More Containers, More Problems
Problem #4- Really big containers
Solution- Go- Buildroot- acbuild for ACIs
github.com/brianredbeard/minimal_containers
NOOOOOOOOO!!!
Your container is 500MB !?
Problems #5-11- Co-locating Containers- Intelligent Scheduling- Port Management- Segmenting workloads- Configuration Management- Secrets Management- Inconsistent Deployments
More Containers, More Problems
Problems #5-11- Co-locating Containers- Intelligent Scheduling- Port Management- Segmenting workloads- Configuration Management- Secrets Management- Inconsistent Deployments
More Containers, More Problems
Solution
More Containers, More Problems
Problem #12 Networking- Too many types of SDNs- IP per POD
More Containers, More Problems
Problem #12 Networking- Too many types of SDNs- IP per POD
Solution- CNI
More Containers, More Problems
Problem #13- Metrics- Monitoring- Alerting
More Containers, More Problems
Problem #13- Metrics- Monitoring- Alerting
Solution- Prometheus
More Containers, More Problems
Problem #14- Vulnerabilities inside
containers
More Containers, More Problems
Problem #14- Vulnerabilities inside
containers
Solution
More Containers, More Problems
Problem #15- Visualize & configure
clusters
More Containers, More Problems
Problem #15- Visualize & configure
clusters
Solution- Tectonic Console
More Containers, More Problems
Problem #16- Running on Bare Metal
More Containers, More Problems
Problem #16- Running on Bare Metal
Solution- Ignition- coreos-baremetal- Tectonic baremetal
installer
More Containers, More Problems
Problem #17- Inability to verify node
trust
More Containers, More Problems
Solution- Distributed Trusted
Computing (DTC)
Problem #17- Inability to verify node
trust
More Containers, More Problems
Problem #18- Persistent storage
More Containers, More Problems
Solution- Torus
Problem #18- Persistent storage
Kubernetes is the kernel, Tectonic is the distro.
tectonic.com @tectonic
off-the-shelf #GIFEE
Kubernetes Contributions
OIDC Authentication
RBAC Authorization
TLS Bootstrapping
rktnetes
2x Scheduler Performance
etcd 3 support
coreos-kubernetes
Bootstrap/Upgrade Simplification
Future
More Management Tools
Expand platform support
Prometheus Enhancements
Federated Clusters
Summary
Open-Source is key
Security is key
Updates are key
Containers
Orchestration
Automatic systems
Ed Rooth@sym3tri | [email protected] | coreos.com
More Containers, More Problems
We’re hiring in all departments! Email: [email protected] Positions: coreos.com/ careers
90+ Projects on GitHub, 1,000+ Contributors
OPEN SOURCE
CoreOS.com - @coreoslinux - github/coreos
Secure solutions, support plans, training + more
ENTERPRISE
[email protected] - tectonic.com - quay.io
CoreOS is Running the World’s Containers