microservices on gke at mercari - crash.academy · 2018-06-01 · microservices is a software...
TRANSCRIPT
Microservices On GKE At MercariGCPUG Tokyo Kubernetes Engine Day @deeeet
@deeeet
Background
Start with Monolith
Small Overhead for cross domains 👍 Reusable code across domains 👍Effective operation by SRE team 👍
3 scalabilities
Growth of business Growth of features Growth of organization
Growth of business Growth of features Growth of organization
Growth of business Growth of features Growth of organization
Huge Monolith
Difficult to understand change effect 👎 Difficult to test 👎 Difficult to on-board👎 Difficult to isolate failure 👎 Difficult to scale independently 👎 Difficult to try new technologies 👎
Growth of business Growth of features Growth of organization
Unclear ownership 😩 Communication overhead 😩
Velocity is stalled ☔
Microservices
Microservices is a software development technique that structures an application as a collection of loosely coupled services with the smallest autonomous boundary.
Technical benefit Organization benefit
Technical benefit Organization benefit
Easy to test 👍 Easy to deploy 👍 Easy to on-board 👍 Easy to isolate failure 👍 Easy to scale independently 👍
Technical benefit Organization benefit
Clear ownership 😁 Minimum communication overhead 😁
Deliver new features faster ☀
How Microservices?
Gateway pattern Strangler pattern
Gateway pattern Strangler pattern
Service A
Service B
Mercari API
API Gateway
Service A
Service B
Mercari API
API Gateway
Service A
Service B
Service X
Mercari API
API Gateway
Service A
Service B
Service X
Multiple services on a single endpoint SSL Termination DDoS Protection Common AuthZ/AuthN
Mercari API
Gateway pattern Strangler pattern
Mercari API
API Gateway
Service A
Service B
Service X
Mercari API
API Gateway
Service B
Service X Service A
Mercari API
API Gateway
Service X Service A Service B
Mercari API
API Gateway
Function X
Function Y
Function Z
Service C
Mercari API
API Gateway
Function X
Facade C
Function Y
Function Z
Service C
Mercari API
API Gateway
Facade C
Function Y
Function Z
Service C
Function X
Mercari API
API Gateway
Facade C
Function Z
Service C
Function X
Function Y
Mercari API
API Gateway
Facade C
Service C
Function X
Function Y
Function Z
Mercari API
API Gateway
Service C
Function X
Function Y
Function Z
Mercari API
API Gateway
Service C
Function X
Function Y
Service D
Function Z
Current Status
API Gateway
Service A
Service B
Service X
Mercari API
Technical Stack
API GatewayAuthority
Service A
Service B
Sakura
Service X
Mercari API
API Gateway
Google Cloud Load balancing
Authority
Service A
Service B
Sakura
Service X
Mercari API
GCPKubernetes Engine
API Gateway
Google Cloud Load balancing
Authority
Service A
Service B
Sakura
Service X
Mercari API
GCPKubernetes Engine
Cloud Resources Managed Services
API Gateway
Google Cloud Load balancing
Authority
Service A
Service B
Sakura
Service X
Mercari API
GCPKubernetes Engine
Cloud Resources Managed Services
Container
API Gateway
Google Cloud Load balancing
Authority
Service A
Service B
Sakura
Service X
Mercari API
GCPKubernetes Engine
Cloud Resources Managed Services
Container
Over HTTP
API Gateway
Google Cloud Load balancing
Authority
Service A
Service B
Sakura
Service X
Mercari API
GCPKubernetes Engine
Cloud Resources Managed Services
Container
Over HTTP
SSL Termination DDoS Protection Cloud Amor?
API Gateway
Google Cloud Load balancing
Authority
Service A
Service B
Sakura
Service X
Mercari API
GCPKubernetes Engine
Cloud Resources Managed Services
Container
Over HTTP
Routing to microservices Protocol tranformation (HTTP to gRPC) Common logging & Tracing Request buffering
SSL Termination DDoS Protection Cloud Amor?
API Gateway
Google Cloud Load balancing
Authority
Service A
Service B
Sakura
Service X
Mercari API
GCPKubernetes Engine
Cloud Resources Managed Services
Container
Over HTTP
Routing to microservices Protocol tranformation (HTTP to gRPC) Common logging & Tracing Request buffering
SSL Termination DDoS Protection Cloud Amor?
Common AuthZ/AuthN
API Gateway
Google Cloud Load balancing
Authority
Service A
Service B
Sakura
Service X
Mercari API
GCPKubernetes Engine
Cloud Resources Managed Services
Container
Over HTTP
Routing to microservices Protocol tranformation (HTTP to gRPC) Common logging & Tracing Request buffering
SSL Termination DDoS Protection Cloud Amor?
Common AuthZ/AuthN
Managed DB
Another important takeaway is that even though all of these listed items are important, ultimately the most critical thing is observability. As I like to say: observability, observability, observability
- Matt Klein, Seeking SRE (Chapter6)
Service A Service BNetwork
Logging? Tracing? (Observability)
Network
Logging? Tracing? (Observability)
Service A Service BNetwork
AuthN and AuthZ? API limit ?
Load balancing ? Request timeout ? Request retry with backoff? Circuit breaking ?
Logging? Tracing? (Observability)
Network
Logging? Tracing? (Observability)
Service A Service BNetwork
AuthN and AuthZ? API limit ?
Load balancing ? Request timeout ? Request retry with backoff? Circuit breaking ?
Logging? Tracing? (Observability)
Network
Logging? Tracing? (Observability)
Different protocols..
Service A Service B
Service C
Service D
Service A Service B
Service C
Service D
Service B
Service B
Service B
How we use GCP?
API Gateway
Google Cloud Load balancing
Authority
Service XGCP
Kubernetes Engine
API Gateway
Google Cloud Load balancing
Authority
Service XGCP
Kubernetes Engine
How we use GKE?
Cluster strategy GCP project strategy Node pool strategy Namespace strategy
Cluster strategy GCP project strategy Node pool strategy Namespace strategy
asia-northeast1us-west1
europe-west1
Each region has its own Cluster
Production Cluster
Development Cluster
Testing/QA will be done in development cluster
All services in 1 cluster No special cluster for specific service
Production Cluster
In future, 1 region 1 cluster like Google Borg
Cluster strategy GCP project strategy Node pool strategy Namespace strategy
GCP project: GKE Production
Production Cluster
GCP project: GKE Development
Development Cluster
IAM: SRE IAM: SRE + α
1 cluster for 1 GCP project
Only SRE can access cluster nodes
Cluster strategy GCP project strategy Node pool strategy Namespace strategy
GCP project: GKE Production
Production Cluster
n1-standard-16 node pool
n1-highmem-16 node poolMachine learning workloads
Normal applications
Auto scaling Enabled Automatic node repair Enabled Preemptible Enabled (only in US)
Cluster strategy GCP project strategy Node pool strategy Namespace strategy
Each services has its own kubernetes namespace
GCP project: GKE Production
Namespace: Service A
Pod: A Pod: A Pod: A
Namespace: Service B
Pod: B Pod: B
Production Cluster
RBAC: Team X
RBAC: Team X
Each team can only access its own kubernetes namespace
API Gateway
Google Cloud Load balancing
Authority
Service XGCP
Kubernetes Engine
How we use GCP services?
How access limit GCP services? Each service should be allowed to access only its own GCP resources
GCP project: GKE ProductionIAM: SRE
Namespace: Service A
Pod: A Pod: A Pod: A
Namespace: Service B
Pod: B Pod: B
Production Cluster
RBAC: Team X
RBAC: Team Y
GCP project: GKE ProductionIAM: SRE
Namespace: Service A
Pod: A Pod: A Pod: A
Namespace: Service B
Pod: B Pod: B
GCP project: Service A
IAM: Team X + SRE
GCP project: Service B
IAM: Team Y + SRE
Production Cluster
Each services has its own GCP project
RBAC: Team X
RBAC: Team Y
GCP project: GKE ProductionIAM: SRE
Namespace: Service A
Pod: A Pod: A Pod: A
Namespace: Service B
Pod: B Pod: B
GCP project: Service A
IAM: Team X + SRECloud SQL
GCP project: Service B
SpannerIAM: Team Y + SRE
Production Cluster
Each services has its own GCP project
RBAC: Team X
RBAC: Team Y
Service resources in its own GCP project
GCP project: GKE ProductionIAM: SRE
Namespace: Service A
Pod: A Pod: A Pod: A
Namespace: Service B
Pod: B Pod: B
GCP project: Service A
IAM: Team X + SRECloud SQL
GCP project: Service B
SpannerIAM: Team Y + SRE
Production Cluster
Each services has its own GCP project
Each namespace has its own service account for its own GCP project
RBAC: Team X
RBAC: Team Y
Service resources in its own GCP project
Each namespace has its own service account
GCP project: GKE ProductionIAM: SRE
Namespace: Service ARBAC: Team X
Pod: A Pod: A Pod: A
Namespace: Service BRBAC: Team Y
Pod: B Pod: B
GCP project: Service A
IAM: Team X + SRECloud SQL
GCP project: Service B
SpannerIAM: Team Y + SRE
Production Cluster
Each services has its own GCP project
Each namespace has its own service account for its own GCP project
Service resources in its own GCP project
IAM: SRE
Namespace: Service ARBAC: Team X
Pod: A Pod: A Pod: A
Namespace: Service BRBAC: Team Y
Pod: B Pod: B
GCP project: Service A
IAM: Team X + SRECloud SQL
GCP project: Service B
SpannerIAM: Team Y + SRE
Production Cluster
GCP project creation…? Setup Spanner or Cloud SQL ..?
GCP project: GKE Production
Infrastructure as Code
CloudSQL instance creation
Spanner instance creation
mercari / microservices-terraform Private
Just create a PR to create new GCP project
Terraform plan on CI
Terraform apply on CI
Tool for notifying terraform result is open sourced https://github.com/mercari/tfnotify
Terraform apply on CI
Common part (GCP project creation, Pagerduty setup) can be bootstrapped
IAM: SRE
Namespace: Service ARBAC: Team X
Pod: A Pod: A Pod: A
Namespace: Service BRBAC: Team Y
Pod: B Pod: B
GCP project: Service A
IAM: Team X + SRECloud SQL
GCP project: Service B
SpannerIAM: Team Y + SRE
Production Cluster
Stackdriver
GCP project: GKE Production
IAM: SRE
Namespace: Service ARBAC: Team X
Pod: A Pod: A Pod: A
Namespace: Service BRBAC: Team Y
Pod: B Pod: B
GCP project: Service A
IAM: Team X + SRECloud SQL
GCP project: Service B
SpannerIAM: Team Y + SRE
Production Cluster
Logging…?Stackdriver
GCP project: GKE Production
How access limit stackdriver logging? Each team should be allowed to access only its service log
IAM: SRE
Namespace: Service ARBAC: Team X
Pod: A Pod: A Pod: A
Namespace: Service BRBAC: Team Y
Pod: B Pod: B
GCP project: Service A
IAM: Team X + SRECloud SQL
GCP project: Service B
SpannerIAM: Team Y + SRE
Production Cluster
Logging…?Stackdriver
GCP project: GKE Production
IAM: SRE
Namespace: Service ARBAC: Team X
Pod: A Pod: A Pod: A
Namespace: Service BRBAC: Team Y
Pod: B Pod: B
GCP project: Service A
IAM: Team X + SRECloud SQL
GCP project: Service B
SpannerIAM: Team Y + SRE
Production Cluster
Stackdriver
Big Query
Big Query
GCP project: GKE Production
Create BQ for each services
IAM: SRE
Namespace: Service ARBAC: Team X
Pod: A Pod: A Pod: A
Namespace: Service BRBAC: Team Y
Pod: B Pod: B
GCP project: Service A
IAM: Team X + SRECloud SQL
GCP project: Service B
SpannerIAM: Team Y + SRE
Production Cluster
Create BQ sink for each services
Stackdriver
Big Query
Big Query
sink
sink
GCP project: GKE Production
Create BQ for each services
BigQuery sink creation
GCP and k8s Ecosystem
Just create ingress it automatically creates DNS records
with Cloud DNS
Disaster Recovering Take backups of your cluster and restore in case of loss.
with Cloud Storage
Non GCP?
Notification or Integration with GitHub
vs. Container Builder
Integration with external services like CDN or AWS
vs. Stackdriver monitoring
vs. Stackdriver error reportNotification and Integration with GitHub
vs. ??GCP does not have chaos as a service
Conclusion
Mercari ❤
@deeeet