carpooling 100% containers powered · a change in my own app/container: “immutable” ... rkt...

100% Containers Powered Carpooling

Maxime FouilleulDatabase Reliability Engineer

Today’s agenda

BlaBlaCar - Facts & Figures

Infrastructure Ecosystem - 100% containers powered carpooling

Stateful Services into containers - MariaDB as an example

Next challenges - Kubernetes, the Cloud

BlaBlaCarFacts & Figures

60 million members

Foundedin 2006

1 million tonnesless CO

In the past year

30 million mobileapp downloadsiPhone and Android

15 milliontravellers /quarter

Currently in22 countriesFrance, Spain, UK, Italy, Poland, Hungary, Croatia, Serbia, Romania, Germany, Belgium, India, Mexico, The Netherlands, Luxembourg, Portugal, Ukraine, Czech Republic, Slovakia, Russia, Brazil and Turkey.

Facts and Figures

MariaDB Redis PostgreSQL

Transactional

Our prod data ecosystem

Cassandra

DistributedVolatile

Spatial

Stream

ElasticSearch

Search

Infrastructure Ecosystem 100% containers powered carpooling

Why containers?

Homogeneous HardwareFrom this

srv_001

svc_001

srv_002

svc_002

srv_003

svc_003

srv_004

svc_004

srv_005

svc_005

srv_006

svc_006

srv_007

svc_007

srv_008

svc_008

srv_009

svc_009

srv_010

svc_010

srv_011

svc_011

srv_012

svc_012

srv_013

srv_014

svc_013

svc_014

Homogeneous HardwareTo that

srv_007

srv_008

svc_013

srv_005

srv_006

srv_003

srv_004

srv_001

srv_002

svc_001

svc_002

svc_003svc_004svc_005

svc_006

svc_007

svc_010

svc_008

svc_011

svc_009

svc_012

svc_014

Homogeneous Hardware - “Pets vs Cattle”

Easier to replace broken hardware

Cost Effective

Easier to manage

redis trip-meeting-point

Homogeneous Deploymenttrip-meeting-point application

cat ./prod-dc1/services/trip-meeting-point/service-manifest.yml---containers: - aci.blbl.cr/aci-trip-meeting-point:20180928.145115-v-979da34 - aci.blbl.cr/aci-go-synapse:15-40 - aci.blbl.cr/aci-go-nerve:21-27 - aci.blbl.cr/aci-logshipper:27

nodes: - hostname: trip-meeting-point1 gelf: level: INFO fleet: - MachineMetadata=rack=110 - Conflicts=*trip-meeting-point* - hostname: trip-meeting-point2 fleet: - MachineMetadata=rack=210 - Conflicts=*trip-meeting-point* - hostname: trip-meeting-point3 fleet: - MachineMetadata=rack=310 - Conflicts=*trip-meeting-point*

cat ./prod-dc1/services/redis-meeting-point/service-manifest.yml---containers: - aci.blbl.cr/aci-redis:4.0.2-1 - aci.blbl.cr/aci-redis-dictator:20 - aci.blbl.cr/aci-go-nerve:21-27 - aci.blbl.cr/aci-prometheus-redis-exporter:0.12.2-1

nodes: - hostname: redis-meeting-point1 fleet: - MachineMetadata=rack=110 - Conflicts=*redis-meeting-point* - hostname: redis-meeting-point2 fleet: - MachineMetadata=rack=210 - Conflicts=*redis-meeting-point* - hostname: redis-meeting-point3 fleet: - MachineMetadata=rack=310 - Conflicts=*redis-meeting-point*

ggn prod-dc1 trip-meeting-point update -y ggn prod-dc1 redis-meeting-point update -y

Volatile by designtrip-meeting-point dependencies

cat ./prod-dc1/services/trip-meeting-point/service-manifest.yml---containers: - aci.blbl.cr/aci-trip-meeting-point:20180928.145115-v-979da34 - aci.blbl.cr/aci-go-synapse:15-41 - aci.blbl.cr/aci-go-nerve:21-27 - aci.blbl.cr/aci-logshipper:27

cat ./aci-trip-meeting-point/aci-manifest.yml---name: aci.blbl.cr/aci-trip-meeting-point:{{.version}}aci: dependencies: - aci.blbl.cr/aci-java:1.8.181-2[...]

cat ./aci-java/aci-manifest.yml---name: aci.blbl.cr/aci-java:1.8.181-2aci: dependencies: - aci.blbl.cr/aci-debian:9.5-9 - aci.blbl.cr/aci-common:7

trip-meeting-point

aci-java

aci-debian aci-common

aci-trip-meeting-point aci-go-synapse aci-go-nerve aci-logshipper

aci-hindsight

Volatile - When should I redeploy?

A change in my own app/container: “immutable”

Noisy neighbours: “mutualization”

A change on a sidecar container or its dependencies

When you are ready for instability your are HA

Infrastructure Ecosystem

bare-metal servers

1 type of hardware

3 disk profiles

fleet cluster

CoreOS

ggn“Distributed init system”

Hardware

Container Registry

Service Codebase

rkt PODs

create mysqld

monitoring

mysql-main1

monitoring

synapse

front1

synapse

zookeeper Service Discovery

Infrastructure Ecosystem

bare-metal servers 1 type of hardware

3 disk profiles

CoreOS

ggn“Distributed init system”

Hardware

Container Registry

Service Codebase

rkt PODs

create mysqld

monitoring

mysql-main1

monitoring

synapse

front1

synapse

zookeeper Service Discovery

kuberneteshelm

backend pod

client pod

Service Discovery

/database/node1

go-nerve does health checks and reports to zookeeper in

service keys

/database

Applications hit their local haproxy to access backends

go-synapse watches zookeeper service keys and reloads haproxy if changes are

detected

HAProxy

go-nerve

Zookeeper

go-synapse

Stateful Services into containersMariaDB as an example

“Stateful” and “volatile by design”?

The recipe/prereqs/pillars to succeed:

Be Quiet!“A node should be able

to restart without impacting the app”

Abolish Slavery “For a given service, every node have the

same role”

Build Smart“Services can be

operate by any SRE”

MariaDB as an example

Abolish Slavery “For a given service, every node

have the same role”

Asynchronous vs. Synchronous

Master

Slave Slave Slave

wsrep wsrep wsrep wsrep

MariaDB Cluster

MariaDB Cluster means

No Single Point of Failure

No Replication Lag

Auto States Transfers

As fast as the slowest

The Target

wsrep wsrep wsrep wsrep

MariaDB Cluster

Containers

Writes go on one node

Writes

Reads are balanced on the others

How to hit the target?Service Discovery

# zookeepercli -c lsr /services/mysql/mainmysql-main1_192.168.1.2_ba0f1f8bmysql-main2_192.168.1.3_734d63damysql-main3_192.168.1.4_dde45787# zookeepercli -c get /services/mysql/main/mysql-main1_192.168.1.2_ba0f1f8b3{ "available":true, "host":"192.168.1.2", "port":3306, "name":"mysql-main1", "weight":255, "labels":{ "host":"r10-srv4" }}

# cat env/prod-dc1/services/mysql-main/attributes/nerve.yml---override: nerve: services: - name: "mysql-main" port: 3306 reporters: - {type: zookeeper, path: /services/mysql/main} checks: - type: sql driver: mysql datasource: "local_mon:local_mon@tcp(127.0.0.1:3306)/"

Nerve - Track and report service status

# cat env/prod-dc1/services/tripsearch/attributes/tripsearch.yml—-override: tripsearch: database: read: host: localhaproxy database: tripsearch user: tripsearch_rd port: 3307 write: host: localhaproxy database: tripsearch user: tripsearch_wr port: 3308

Synapse - Service discovery router# cat env/prod-dc1/services/tripsearch/attributes/synapse.yml---override: synapse: services: - name: mysql-main_read path: /services/mysql/main port: 3307 - name: mysql-main_write path: /services/mysql/main port: 3308 serverOptions: backup serverSort: date

Be Quiet!“A node should be able to

restart without impacting the app”

# cat env/prod-dc1/services/mysql-main/attributes/nerve.yml---override: nerve: services: - name: "mysql-main" port: 3306 reporters: - {type: zookeeper, path: /services/mysql/main} checks: - type: sql driver: mysql request: "SELECT 1" datasource: "local_mon:local_mon@tcp(127.0.0.1:3306)/"

Nerve - “Readiness Probe”

mysql -h 127.0.0.1 -ulocal_mon -plocal_mon -p3306 -e ‘SELECT 1;’

Starting Pod mysql-main1Nerve check is KO

Starting MySQLNerve check is KO

MySQL is syncing (IST/SST)Nerve check is KO

MySQL is readyNerve check is OK

# cat env/prod-dc1/services/mysql-main/attributes/nerve.yml---override: nerve: services: - name: "mysql-main" port: 3306 reporters: - {type: zookeeper, path: /services/mysql/main} checks: - type: sql driver: mysql datasource: "local_mon:local_mon@tcp(127.0.0.1:3306)/" disableCommand: "/report_remaining_processes.sh" disableMaxDurationInMilli: 180000

Nerve - “Grace Period”

The remaining sessions are finishing their job

Pod Stopped

The service can be shutdown without risk.

Stop Pod

Call /disable on Nerve’s API

Set weight to 0 = no more new sessions will go into the services.

SELECT COUNT(1) FROM processlist WHERE user LIKE 'app_%';

Build Smart“Services can be operate by any SRE”

Use Service Discovery to find peersExample:

Use Service Discovery to find peers

Eg: the wsrep_cluster_address attribute in Galera Cluster

Description: The addresses of cluster nodes to connect to when starting up. Good practice is to specify all possible cluster nodes, in the form gcomm://<node1 or ip:port>,<node2 or ip2:port>,<node3 or ip3:port>. Specifying an empty ip (gcomm://) will cause the node to start a new cluster.

mysql-main

node2 node3 node1

Ask the Service Discovery to find

mysql-main peers ?

No peer found!

wsrep_cluster_address = gcomm://

wsrep_cluster_address = gcomm://node1

node1, node2

wsrep_cluster_address = gcomm://node1,node2

Next challenges Kubernetes, the Cloud

Kubernetes, the Cloud, why now?

Fleet is deprecated

Fleet is no longer developed and maintained by CoreOS.

Kubernetes

From a simple “Distributed init

system” to the standard for container

orchestration.Docker

rkt-based implementation of Kubernetes has a poor adoption.

Service Oriented Architecture

Delegated Ownership.

Google Kubernetes Engine & Managed Services

Allows us to focus on services.

3 years old servers

We need to renew our hardware.

Kubernetes, the Cloud, why now?

Kubernetes and stateful services?

Kubernetes Statefulsets

Stable, unique network identifiers.

Stable, persistent storage.

Ordered, graceful deployment, scaling and rolling updates.

StatefulSets control Pods that are based on an identical spec.

Google Kubernetes Engine...

Why are we excited about GKE?

Native suport of Liveness and Readiness probes

Release granularity, from Pod to Deployment/Statefulset

Native Service Discovery (kube-proxy and Services)

GCEPersistentDisk provisioner to manage Persistent Volumes

This + resources limitations make powerfull orchestration

See you next year for 100% GKE Powered Carpooling !

carpooling 100% containers powered · a change in my own app/container: “immutable” ... rkt...

Documents

sarav sip ppt main1

peripheral nerve compression - median nerve · peripheral...

mysql cluster – evaluation and tests, october 2, 2012 1...

dr. ayat eldomouky. Ⅰ olfactory nerve Ⅱ optic nerve ...

peripheral nerve injuries (part “c”) nerve injuries part...

the cranial nerve Ⅰ. olfactory nerve Ⅱ. optic nerve Ⅲ....

neurilemmoma of extremities: mr findings...nerve oforigin...

paul bernd spahn, goethe-universität frankfurt/main1 money...

the liberation war of bangladesh assign main1

nervios oculomotores - universidad icesi · cn 11 cn 111 cn...

mysql test framework for troubleshooting!include...

paul bernd spahn, goethe-universität frankfurt/main1...

anatomy of ulnar nerve (ulnar nerve anatomy)

paul bernd spahn, goethe-universität frankfurt/main1...

paul bernd spahn, goethe-universität frankfurt/main1...

panan main1 rio_omicron

how effective is the combination of your main1

automatic fare collection system main1

cranial nerves. names of cranial nerves Ⅰ olfactory nerve...

paul bernd spahn, goethe-universität frankfurt/main1...