flume intro-100715
DESCRIPTION
Flume: Reliable Distributed Streaming Log CollectionTRANSCRIPT
Jonathan Hsieh, Henry Robinson, Patrick Hunt
Cloudera, Inc
7/15/2010
Flume
Reliable Distributed
Streaming Log Collection
Scenario
• Situation:
– You have hundreds of services producing logs in a datacenter .
– They produce a lot of logs that you want to analyzed
– You have Hadoop, a system for processing large volumes of data.
• Problem:
– How do I reliably ship all my logs to a place that Hadoop can analyze them?
7/15/2010 3
Use cases
• Collecting logs from nodes in Hadoop cluster
• Collecting logs from services such as httpd, mail, etc.
• Collecting impressions from custom apps for an ad network
• But wait, there’s more!– Basic metrics of available
– Basic online in-stream analysis
It’s log, log .. Everyone wants a log!
7/15/2010 4
A sample topology
Collector
HDFS
AgentAgentAgentAgent
CollectorAgentAgentAgentAgent
CollectorAgentAgentAgentAgent
/logs/web/2010/0715/1200
/logs/web/2010/0715/1300
/logs/web/2010/0715/1400
MasterAgent tier Collector tier
7/15/2010 5
You need a “Flume”
• Flume is a distributed system that gets your logs from their source and aggregates them to where you want to process them.
• Open source, Apache v2.0 License
• Goals:
– Reliability
– Scalability
– Extensibility
– Manageability
Columbia Gorge, Broughton Log Flume7/15/2010 6
Key abstractions
• Data path and control path
• Nodes are in the data path – Nodes have a source and a sink
– They can take different roles• A typical topology has agent nodes and collector nodes.
• Optionally it has processor nodes.
• Masters are in the control path.– Centralized point of configuration.
– Specify sources and sinks
– Can control flows of data between nodes
– Use one master or use many with a ZK-backed quorum
Master
Collector
Agent
7/15/2010 7
A sample topology
Collector
HDFS
AgentAgentAgentAgent
CollectorAgentAgentAgentAgent
CollectorAgentAgentAgentAgent
/logs/web/2010/0715/1200
/logs/web/2010/0715/1300
/logs/web/2010/0715/1400
MasterAgent tier Collector tier
7/15/2010 8
/logs/web/2010/0715/1200
/logs/web/2010/0715/1300
/logs/web/2010/0715/1400
Agent tier Collector tier
Storage tier
Masters
Collector
HDFS
AgentAgentAgentAgent
CollectorAgentAgentAgentAgent
CollectorAgentAgentAgentAgent
Master
7/15/2010 9
Outline
• What is Flume?– Goals and architecture
• Reliability– Fault-tolerance and High availability
• Scalability– Horizontal scalability of all nodes and masters
• Extensibility– Unix principle, all kinds of data, all kinds of sources, all kinds of sinks
• Manageability– Centralized management supporting dynamic reconfiguration
7/15/2010 10
RELIABILITY
The logs will still get there.
7/15/2010 11
Failures
• Faults can happen at many levels
– Software applications can fail
– Machines can fail
– Networking gear can fail
– Excessive networking congestion or machine load
– A node goes down for maintenance.
• How do we make sure that events make it to a permanent store?
7/15/2010 12
Tunable data reliability levels
• Best effort– Fire and forget
• Store on failure + retry– Local acks, local errors
detectable – Failover when faults detected.
• End to end reliability– End to end acks– Data survives compound failures,
and may be retried multiple times
HDFSCollectorAgent
HDFSCollectorAgent
HDFSCollectorAgent
7/15/2010 13
Dealing with Agent failures
• We do not want to lose data
• Make events durable at the generation point.
– If a log generator goes down, it is not generating logs.
– If the event generation point fails and recovers, data will reach the end point
• Data is durable and survive if machines crashes and reboots
– Allows for synchronous writes in log generating applications.
• Watchdog program to restart agent if it fails.
7/15/2010 14
Dealing with Collector Failures
• Data is durable at the agent:
– Minimize the amount of state and possible data loss
– Not necessary to durably keep intermediate state at collector
– Retry if collector goes down.
• Use hot failover so agents can use alternate paths:
– Master predetermines failovers to load balance when collectors go down.
7/15/2010 15
Master Service Failures
• An master machine should not be the single point of failure!
• Masters keep two kinds of information:
• Configuration information (node/flow configuration) – Kept in ZooKeeper ensemble for persistent, highly available metadata store
– Failures easily recovered from
• Ephemeral information (heartbeat info, acks, metrics reports)– Kept in memory
– Failures will lose data
– This information can be lazily replicated
7/15/2010 16
SCALABILITY
Logs jamming the Kemi River7/15/2010 17
A sample topology
Collector
HDFS
AgentAgentAgentAgent
CollectorAgentAgentAgentAgent
CollectorAgentAgentAgentAgent
/logs/web/2010/0715/1200
/logs/web/2010/0715/1300
/logs/web/2010/0715/1400
MasterAgent tier Collector tier
7/15/2010 18
Data path is horizontally scalable
• Add collectors to increase availability and to handle more data– Assumes a single agent will not dominate a collector– Fewer connections to HDFS.– Larger more efficient writes to HDFS.
• Agents have mechanisms for machine resource tradeoffs• Write log locally to avoid collector disk IO bottleneck and catastrophic failures• Compression and batching (trade cpu for network)• Push computation into the event collection pipeline (balance IO, Mem, and CPU
resource bottlenecks)
CollectorAgentAgentAgentAgent HDFS
7/15/2010 19
Load balancing
• Agents are logically partitioned and send to different collectors
• Use randomization to pre-specify failovers when many collectors exist • Spread load if a collector goes down.
• Spread load if new collectors added to the system.
CollectorAgentAgent
AgentAgent
CollectorAgentAgent
Collector
7/15/2010 20
Load balancing
• Agents are logically partitioned and send to different collectors
• Use randomization to pre-specify failovers when many collectors exist • Spread load if a collector goes down.
• Spread load if new collectors added to the system.
CollectorAgent
AgentAgent
CollectorAgentAgent
Collector
Agent
7/15/2010 21
Control plane is horizontally scalable
• A master controls dynamic configurations of nodes
– Uses consensus protocol to keep state consistent
– Scales well for configuration reads
– Allows for adaptive repartitioning in the future
• Nodes can talk to any master.
• Masters can talk to an ZK member
Master
Master
Master
ZK1
ZK2
ZK3
Node
Node
Node
7/15/2010 22
Control plane is horizontally scalable
• A master controls dynamic configurations of nodes
– Uses consensus protocol to keep state consistent
– Scales well for configuration reads
– Allows for adaptive repartitioning in the future
• Nodes can talk to any master.
• Masters can talk to an ZK member
Master
Master
Master
ZK1
ZK2
ZK3
Node
Node
Node
7/15/2010 23
Control plane is horizontally scalable
• A master controls dynamic configurations of nodes
– Uses consensus protocol to keep state consistent
– Scales well for configuration reads
– Allows for adaptive repartitioning in the future
• Nodes can talk to any master.
• Masters can talk to an ZK member
Master
Master
Master
ZK1
ZK2
ZK3
Node
Node
Node
7/15/2010 24
EXTENSIBILITY
Turn raw logs into something useful…
7/15/2010 25
Flume is easy to extend
• Simple source and sink APIs
– Event granularity streaming design
– Have many simple operations and compose for complex behavior.
• End-to-end principle
– Put smarts and state at the end points. Keep the middle simple.
• Flume deals with reliability.
– Just add a new source or add a new sink and Flume has primitives to deal with reliability
7/15/2010 26
App
Variety of Data sources
• Can deal with push and pull sources.
• Supports many legacy event sources
– Tailing a file
– Output from periodically Exec’ed program
– Syslog, Syslog-ng
– Experimental: IRC / Twitter / Scribe / AMQP
Agent
Agent
Agent
Apppoll
push
embed
7/15/2010 27
Variety of Data output
• Send data to many sinks
– Files, Hdfs, Console, RPC
– Experimental: hbase, voldemort, s3, etc..
• Supports an extensible variety of outputs formats and destinations
– Output to language neutral and open data formats (json, avro, text)
– Compressed output files in development
• Uses decorators to process event data in flight.
– Sampling, attribute extraction, filtering, projection, checksumming, batching, wire compression, etc..
7/15/2010 28
MANAGEABILITY
Wheeeeee!7/15/2010 29
Centralized data flow management
• One place to specify node sources, sinks and data flows.
– Simply specify the role of the node: collector, agent
– Or specify a custom configuration for a node
• Control Interfaces:
– Flume Shell
– Basic web
– HUE + Flume Manager App (Enterprise users)
7/15/2010 30
Output bucketing
• Automatic output file management
– Write hdfs files in over time based tags
7/15/2010 31
HDFS
/logs/web/2010/0715/1200/data-xxx.txt
/logs/web/2010/0715/1200/data-xxy.txt
/logs/web/2010/0715/1300/data-xxx.txt
/logs/web/2010/0715/1300/data-xxy.txt
/logs/web/2010/0715/1400/data-xxx.txt
…
Collector
node : collectorSource | collectorSink
(“hdfs://namenode/logs/web/%Y/%m%d/%H00”, “data”)
Collector
Simplified configurations
• To make configuring flume nodes higher level, we use logical nodes.
– The Flume node process is a physical node
– Each Flume node process can host multiple logical nodes
• Allows for:
– Reduces the amount of detail required in configurations.
– Reduces management process-centric management overhead
– Allows for finer-grained resource control and isolation with flows
7/15/2010 32
Flow Isolation
• Isolate different kinds of data when and where it is generated
– Have multiple logical nodes on a machine
– Each has their own data source
– Each has their own data sink
Agent
Agent
AgentAgent
Collector
Agent
Agent
Collector
Collector
7/15/2010 33
• Isolate different kinds of data when it is generated
– Have multiple logical nodes on a machine
– Each has their own data source
– Each has their own data sink
Agent
Agent
Agent
Flow Isolation
Collector
Collector
CollectorAgent
Agent
Agent
7/15/2010 34
For advanced users
• A concise and precise configuration language for specifying arbitrary data paths.
– Dataflows are essentially DAGs
– Control specific event flows
• Enable durability mechanism and failover mechanisms
• Tune the parameters these mechanisms
– Dynamic updates of configurations
• Allows for live failover changes
• Allows for handling newly provisioned machines
• Allows for changing analytics
7/15/2010 35
CONCLUSIONS
7/15/2010 36
Summary
• Flume is a distributed, reliable, scalable, system for collecting and delivering high-volume continuous event data such as logs
– Tunable data reliability levels for day
– Reliable master backed by ZK
– Write data to HDFS into buckets ready for batch processing
– Dynamically configurable node
– Simplified automated management for agent+collector topologies
• Open Source Apache v2.0.
7/15/2010 37
Contribute!
• GitHub source repo– http://github.com/cloudera/flume
• Mailing lists– User: https://groups.google.com/a/cloudera.org/group/flume-user– Dev: https://groups.google.com/a/cloudera.org/group/flume-dev
• Development trackers– JIRA (bugs/ formal feature requests):
• https://issues.cloudera.org/browse/FLUME
– Review board (code reviews): • http://review.hbase.org -> http://review.cloudera.org
• IRC Channels– #flume @ irc.freenode.net
7/15/2010 38
Image credits
• http://www.flickr.com/photos/victorvonsalza/3327750057/
• http://www.flickr.com/photos/victorvonsalza/3207639929/
• http://www.flickr.com/photos/victorvonsalza/3327750059/
• http://www.emvergeoning.com/?m=200811
• http://www.flickr.com/photos/juse/188960076/
• http://www.flickr.com/photos/juse/188960076/
• http://www.flickr.com/photos/23720661@N08/3186507302/
• http://clarksoutdoorchairs.com/log_adirondack_chairs.html
• http://www.flickr.com/photos/dboo/3314299591/
7/15/2010 40