distributed logging architecture in the container era
TRANSCRIPT
Distributed Logging Architecture in Container Era
LinuxCon Japan 2016 at Jun 13 2016
Satoshi "Moris" Tagomori (@tagomoris)
http://www.linuxfoundation.org/news-media/announcements/2016/06/chaosuan-crunchy-data-qbox-storageos-and-treasure-data-join-cloud
Topics
• Microservices and logging in various industries
• Difficulties of logging with containers
• Distributed logging architecture
• Patterns of distributed logging architecture
• Case Study: Docker and Fluentd
Logging in Various Industries
• Web access logs • Views/visitors on media • Views/clicks on Ads
• Commercial transactions (EC, Game, ...)
• Data from devices • Operation logs on Apps of phones • Various sensor data
Microservices and Logging
• Monolithic service • a service produces all data
about an user's behavior
• Microservices • many services produce data
about an user's access • it's needed to collect logs
from many services to know what is happening
Users
Service (Application)
Logs
Users
Logs
Containers: "a must" for microservices
• Dividing a service into services • a service requires less computing resources
(VM -> containers)
• Making services independent from each other • but it is very difficult :( • some dependency must be solved even in
development environment(containers on desktop)
Redesign Logging: Why?• No permanent storages
• No fixed physical/network address
• No fixed mapping between servers and roles
• We should parse/label logs at the source, ship these logs by pushing to destination ASAP
Containers: immutable & disposable
• No permanent storages
• Where to write logs? • files in the container
→ gone w/ container instance 😞 • directories shared from hosts
→ hosts are shared by many containers/services ☹
• TODO: ship logs from container to anywhere ASAP
Containers: unfixed addresses
• No fixed physical / network address
• Where should we go to fetch logs? • Service discovery (e.g., consul)
→ one more component 😞 • rsync? ssh+tail? or ..? Is it installed in containers?
→ one more tool to depend on ☹
• TODO: push logs to anywhere from containers
Containers: instances per roles
• No fixed mapping between servers and roles
• How can we parse / store these logs? • Central repository about log syntax
→ very hard to maintain 😞 • Label logs by source address
→ many containers/roles in a host ☹
• TODO: label & parse logs at source of logs
Core Architecture
• Collector nodes
• Aggregator nodes
• Destinations
Collector nodes(Docker containers + agent)
Destinations (Storage, Database, ...)
Aggregator nodes
• Parse/Label (collector) • Raw logs are not good for processing • Convert logs to structured data (key-value pairs)
• Split/Sort (aggregator) • Mixed logs are not good for searching • Split whole data stream into streams per services
• Store (destination) • Format logs(records) as destination expects
Collecting and Storing Data
Scaling Logging• Network traffic
• CPU load to parse / format • Parse logs on each collector (distributed) • Format logs on aggregator (to be distributed)
• Capability • Make aggregators redundant
• Controlling delay • to make sure when we can know what's happening in our
systems
source aggregationNO
source aggregationYES
destinationaggregation
NO
destinationaggregation
YES
Aggregation Patterns
Source Side Aggregation Patterns
w/o source aggregation w/ source aggregation
collector
aggregator /
destination
aggregate container
Without Source Aggregation
• Pros: • Simple configuration
• Cons: • fixed aggregator (endpoint) address • many network connections • high load in aggregator
collector
aggregator
With Source Aggregation
• Pros: • less connections • lower load in aggregator • less configuration in containers
(by specifying localhost) • highly flexible configuration
(by deployment only of aggregate containers)
• Cons: • a bit much resource (+1 container per host)
aggregate container
aggregator
Destination Side Aggregation Patterns
w/o destination aggregation w/ destination aggregation
aggregator
collector
destination
Without Destination Aggregation
• Pros: • Less nodes • Simpler configuration
• Cons: • Storage side change affects collector side • Worse performance: many small write requests
on storage
With Destination Aggregation
• Pros: • Collector side configuration is
free from storage side changes • Better performance with fine tune
on destination side aggregator
• Cons: • More nodes • A bit complex configuration
aggregator
Scaling PatternsScaling Up Endpoints
HTTP/TCP load balancer Huge queue + workers
Scaling Out Endpoints Round-robin clients
Load balancer
Backend nodes
Collector nodes
Aggregator nodes
Scaling Up Endpoints
• Pros: • Simple configuration
in collector nodes
• Cons: • Limits about scaling up
Load balancer
Backend nodes
Scaling Out Endpoints
• Pros: • Unlimited scaling
by adding aggregator nodes
• Cons: • Complex configuration • Client features for round-robin
WithoutDestination Aggregation
WithDestination Aggregation
Scaling UpEndpoints Systems in early stages
Collecting logs over Internet
or
Using queues
Scaling OutEndpoints
Impossible :(
Collector nodes must knowall endpoints
↓Uncontrollable
Collecting logsin datacenter
Case Study: Docker+Fluentd
• Destination aggregation + scaling up • Fluent logger + Fluentd
• Source aggregation + scaling up • Docker json logger + Fluentd + Elasticsearch • Docker fluentd logger + Fluentd + Kafka
• Source/Destination aggregation + scaling out • Docker fluentd logger + Fluentd
Why Fluentd?• Docker Fluentd logging driver
• Docker containers can send logs to Fluentd directly - less overhead
• Pluggable architecture • Various destination systems
• Small memory footprint • Source aggregation requires +1 container per host • Less additional resource usage ( < 100MB )
Destination aggregation + scaling up
• Sending logs directly over TCP by Fluentd logger library in application code
• Same with patterns of New Relic
• Easy to implement - good for startups Application code
Source aggregation + scaling up
• Kubernetes: Json logger + Fluentd + Elasticsearch
• Applications write logs to STDOUT
• Docker writes logs as JSON in files
• Fluentd reads logs from file parse JSON objects writes logs to Elasticsearch
• EFK stack (like ELK stack)
http://kubernetes.io/docs/getting-started-guides/logging-elasticsearch/
Elasticsearch
Application code
Files (JSON)
Source aggregation + scaling up/out• Docker fluentd logging driver + Fluentd + Kafka
• Applications write logs to STDOUT
• Docker sends logs to localhost Fluentd
• Fluentd gets logs over TCP pushes logs into Kafka
• Highly scalable & less overhead - very good for huge deployment
Kafka
Application code
Application code
Source/Destination aggregation + scaling out
• Docker fluentd logging driver + Fluentd
• Applications write logs to STDOUT
• Docker sends logs to localhost Fluentd
• Fluentd gets logs over TCP sends logs into Aggregator Fluentd w/ round-robin load balance
• Highly flexible- good for complex data processing requirements Any other storages
What's the Best?• Writing logs from containers: Some way to do it
• Docker logging driver • Write logs on files + read/parse it • Send logs from apps directly
• Make the platform scalable! • Source aggregation: Fluentd on localhost • Scalable storage: (Kafka, external services, ...)
• No destination aggregation + Scaling up • Non-scalable storage: (Filesystems, RDBMSs, ...)
• Destination aggregation + Scaling out
Why OSS?
• Logging layer is interface • transparency • interoperability
• Keep the platform scalable • number of nodes • number of types of source/destination