ari rabkin and randy katz uc berkeley usenix lisa 2010 ... · leveraging hadoop • really want to...
TRANSCRIPT
![Page 1: Ari Rabkin and Randy Katz UC Berkeley USENIX LISA 2010 ... · Leveraging Hadoop • Really want to use HDFS for storage and MapReduce for processing. ... – Supports legacy systems](https://reader033.vdocuments.net/reader033/viewer/2022050102/5f40f472a2aca128f43cf69d/html5/thumbnails/1.jpg)
UC Berkeley
Ari Rabkin and Randy Katz UC Berkeley
USENIX LISA 2010
Chukwa: a scalable log collector
With thanks to… Eric Yang, Jerome Boulon, Bill Graham, Corbin Hoenes, and all the other Chukwa developers, contributors, and users
![Page 2: Ari Rabkin and Randy Katz UC Berkeley USENIX LISA 2010 ... · Leveraging Hadoop • Really want to use HDFS for storage and MapReduce for processing. ... – Supports legacy systems](https://reader033.vdocuments.net/reader033/viewer/2022050102/5f40f472a2aca128f43cf69d/html5/thumbnails/2.jpg)
Why collect logs?
• Many uses – Need logs to monitor/debug systems – Machine learning is getting increasingly good
at detecting anomalies automatically. – Web log analysis is key to many businesses
• Easier to process if centralized
![Page 3: Ari Rabkin and Randy Katz UC Berkeley USENIX LISA 2010 ... · Leveraging Hadoop • Really want to use HDFS for storage and MapReduce for processing. ... – Supports legacy systems](https://reader033.vdocuments.net/reader033/viewer/2022050102/5f40f472a2aca128f43cf69d/html5/thumbnails/3.jpg)
Three Bets 1. MapReduce processing
is necessary at scale. 2. Reliability matters for log
collection 3. Should use Hadoop, not
re-write storage and processing layers
![Page 4: Ari Rabkin and Randy Katz UC Berkeley USENIX LISA 2010 ... · Leveraging Hadoop • Really want to use HDFS for storage and MapReduce for processing. ... – Supports legacy systems](https://reader033.vdocuments.net/reader033/viewer/2022050102/5f40f472a2aca128f43cf69d/html5/thumbnails/4.jpg)
Leveraging Hadoop
• Really want to use HDFS for storage and MapReduce for processing. + Highly scalable, highly robust + Good integrity properties.
• HDFS has quirks - Files should be big - No concurrent appends - Weak synchr onization semantics
![Page 5: Ari Rabkin and Randy Katz UC Berkeley USENIX LISA 2010 ... · Leveraging Hadoop • Really want to use HDFS for storage and MapReduce for processing. ... – Supports legacy systems](https://reader033.vdocuments.net/reader033/viewer/2022050102/5f40f472a2aca128f43cf69d/html5/thumbnails/5.jpg)
HDFS
Map- Reduce Jobs
The architecture
Data Sink (5 minutes)
Archival Storage (Indefinitely)
Data App1 log
App2 log
Metrics …
Agent (seconds) Agent
(seconds) Agent (seconds)
One Per Node
Collector (seconds) Collector
(seconds)
Per 100 nodes
SQL DB (or HBase)
![Page 6: Ari Rabkin and Randy Katz UC Berkeley USENIX LISA 2010 ... · Leveraging Hadoop • Really want to use HDFS for storage and MapReduce for processing. ... – Supports legacy systems](https://reader033.vdocuments.net/reader033/viewer/2022050102/5f40f472a2aca128f43cf69d/html5/thumbnails/6.jpg)
Design envelope
Data Rate per host (bytes/sec)
Chukwa not needed – clients should write direct to HDFS
Num
ber o
f Hos
ts
Don’t need Chukwa: use NFS instead
Need better FS!
Need more aggressive batching or fan-in control
![Page 7: Ari Rabkin and Randy Katz UC Berkeley USENIX LISA 2010 ... · Leveraging Hadoop • Really want to use HDFS for storage and MapReduce for processing. ... – Supports legacy systems](https://reader033.vdocuments.net/reader033/viewer/2022050102/5f40f472a2aca128f43cf69d/html5/thumbnails/7.jpg)
Respecting boundaries
• Architecture captures the boundary between monitoring and production services – Important in practice! – Particularly nice in cloud context
ds) s) Agent Co Collector Data Sink
Structured Storage
…
App1 log
App2 log
Metrics
Monitoring system System being monitored
Control Protocol
![Page 8: Ari Rabkin and Randy Katz UC Berkeley USENIX LISA 2010 ... · Leveraging Hadoop • Really want to use HDFS for storage and MapReduce for processing. ... – Supports legacy systems](https://reader033.vdocuments.net/reader033/viewer/2022050102/5f40f472a2aca128f43cf69d/html5/thumbnails/8.jpg)
Comparison
Amazon CloudWatch Metrics
Logs
![Page 9: Ari Rabkin and Randy Katz UC Berkeley USENIX LISA 2010 ... · Leveraging Hadoop • Really want to use HDFS for storage and MapReduce for processing. ... – Supports legacy systems](https://reader033.vdocuments.net/reader033/viewer/2022050102/5f40f472a2aca128f43cf69d/html5/thumbnails/9.jpg)
Data sources
• We optimize for the case of logs on disk – Supports legacy systems – Writes to local disk almost always succeed – Kept in memory in practice – fs caching
• Can also handle other data sources – adaptors are pluggable – Support syslog, other UDP, JMS messages.
![Page 10: Ari Rabkin and Randy Katz UC Berkeley USENIX LISA 2010 ... · Leveraging Hadoop • Really want to use HDFS for storage and MapReduce for processing. ... – Supports legacy systems](https://reader033.vdocuments.net/reader033/viewer/2022050102/5f40f472a2aca128f43cf69d/html5/thumbnails/10.jpg)
Reliability
• Agents can crash • Record how much data from each source
has been written successfully. • Resume at that point after crash • Fix duplicates in the storage layer
Data Sent and committed not committed
![Page 11: Ari Rabkin and Randy Katz UC Berkeley USENIX LISA 2010 ... · Leveraging Hadoop • Really want to use HDFS for storage and MapReduce for processing. ... – Supports legacy systems](https://reader033.vdocuments.net/reader033/viewer/2022050102/5f40f472a2aca128f43cf69d/html5/thumbnails/11.jpg)
Collector Agent HDFS
Incorporating Asynchrony
• What about collector crashes?
• Want to tolerate asynchronous HDFS writes without blocking agent
• Solution: async. acks • Tell agent where data
will be written if write succeeds.
• Uses single-writer aspect of HDFS
ls
Data
In Foo.done @ 3000
Length of Foo.done = 3000
Data
Foo.done@ 3000 ….
Committed
Query
![Page 12: Ari Rabkin and Randy Katz UC Berkeley USENIX LISA 2010 ... · Leveraging Hadoop • Really want to use HDFS for storage and MapReduce for processing. ... – Supports legacy systems](https://reader033.vdocuments.net/reader033/viewer/2022050102/5f40f472a2aca128f43cf69d/html5/thumbnails/12.jpg)
Fast path
HDFS
Map- Reduce Jobs
Data Sink (5 minutes)
Cleaned
Data Storage (Indefinitely)
Data App1 log
App2 log
Metrics …
Agent (seconds) Agent
(seconds) Agent (seconds)
One Per Node
Collector (seconds) Collector
(seconds)
Per 100 nodes
Fast-path clients (seconds)
![Page 13: Ari Rabkin and Randy Katz UC Berkeley USENIX LISA 2010 ... · Leveraging Hadoop • Really want to use HDFS for storage and MapReduce for processing. ... – Supports legacy systems](https://reader033.vdocuments.net/reader033/viewer/2022050102/5f40f472a2aca128f43cf69d/html5/thumbnails/13.jpg)
Two modes
Robust delivery • Data visible in minutes • Collects everything • Stores to HDFS • Will resend after a crash • Facilitates MapReduce • Used for bulk analysis
Prompt delivery • Data visible in seconds • User-specified filter • Written over a socket • Delivered at most once • Facilitates near-real-time
monitoring • Used for real-time
graphing
![Page 14: Ari Rabkin and Randy Katz UC Berkeley USENIX LISA 2010 ... · Leveraging Hadoop • Really want to use HDFS for storage and MapReduce for processing. ... – Supports legacy systems](https://reader033.vdocuments.net/reader033/viewer/2022050102/5f40f472a2aca128f43cf69d/html5/thumbnails/14.jpg)
Overhead [with Cloudstone]
46
48
50
52
54
Without Chukwa With Chukwa
Op
s p
er s
ec
![Page 15: Ari Rabkin and Randy Katz UC Berkeley USENIX LISA 2010 ... · Leveraging Hadoop • Really want to use HDFS for storage and MapReduce for processing. ... – Supports legacy systems](https://reader033.vdocuments.net/reader033/viewer/2022050102/5f40f472a2aca128f43cf69d/html5/thumbnails/15.jpg)
Collection rates
• Tested on EC2 • Able to write 30MB/
sec/collector • Note: data is about
12 months old
0
5
10
15
20
25
30
35
10 20 30 40 50 60 70 80 90
Co
llec
tor
wri
te r
ate
(MB
/sec
)Agent send rate (MB/sec)
![Page 16: Ari Rabkin and Randy Katz UC Berkeley USENIX LISA 2010 ... · Leveraging Hadoop • Really want to use HDFS for storage and MapReduce for processing. ... – Supports legacy systems](https://reader033.vdocuments.net/reader033/viewer/2022050102/5f40f472a2aca128f43cf69d/html5/thumbnails/16.jpg)
Collection rates
• Scales linearly • Able to saturate
underlying FS
40
60
80
100
120
140
160
180
200
220
4 6 8 10 12 14 16 18 20
To
tal
Th
rou
gh
pu
t (M
B/s
ec)
Collectors
![Page 17: Ari Rabkin and Randy Katz UC Berkeley USENIX LISA 2010 ... · Leveraging Hadoop • Really want to use HDFS for storage and MapReduce for processing. ... – Supports legacy systems](https://reader033.vdocuments.net/reader033/viewer/2022050102/5f40f472a2aca128f43cf69d/html5/thumbnails/17.jpg)
Experiences
• Currently in use at: • UC Berkeley's RAD Lab, to monitor Cloud
experiments • CBS Interactive, Selective Media, and
Tynt for web log analysis – Dozens of machines – Gigabytes to Terabytes per day
• Other sites too…we don't have a census
![Page 18: Ari Rabkin and Randy Katz UC Berkeley USENIX LISA 2010 ... · Leveraging Hadoop • Really want to use HDFS for storage and MapReduce for processing. ... – Supports legacy systems](https://reader033.vdocuments.net/reader033/viewer/2022050102/5f40f472a2aca128f43cf69d/html5/thumbnails/18.jpg)
Related Work Handles logs
Crash recovery?
Metadata Interface Agent-side control
Ganglia/ Nagios/ other NMS
No No No UDP No Scribe Yes No No RPC Yes Flume Yes Yes Yes flexible No Chukwa Yes Yes Yes flexible Yes
![Page 19: Ari Rabkin and Randy Katz UC Berkeley USENIX LISA 2010 ... · Leveraging Hadoop • Really want to use HDFS for storage and MapReduce for processing. ... – Supports legacy systems](https://reader033.vdocuments.net/reader033/viewer/2022050102/5f40f472a2aca128f43cf69d/html5/thumbnails/19.jpg)
Next steps
• Tighten security, to make Chukwa suitable for world-facing deployments
• Adjustable durability – Should be able to buffer arbitrary non-file data
for reliability • HBase for near-real-time metrics display • Built-in indexing • Your idea here: Exploit open source!
![Page 20: Ari Rabkin and Randy Katz UC Berkeley USENIX LISA 2010 ... · Leveraging Hadoop • Really want to use HDFS for storage and MapReduce for processing. ... – Supports legacy systems](https://reader033.vdocuments.net/reader033/viewer/2022050102/5f40f472a2aca128f43cf69d/html5/thumbnails/20.jpg)
Conclusions
• Chukwa is a distributed log collection system that is
• Practical – In use at several sites
• Scalable – Builds on Hadoop for storage and processing
• Reliable – Able to tolerate multiple concurrent failures
without losing or mangling data • Open Source
– Former Hadoop subproject, currently in Apache incubation, enroute to top level project.
![Page 21: Ari Rabkin and Randy Katz UC Berkeley USENIX LISA 2010 ... · Leveraging Hadoop • Really want to use HDFS for storage and MapReduce for processing. ... – Supports legacy systems](https://reader033.vdocuments.net/reader033/viewer/2022050102/5f40f472a2aca128f43cf69d/html5/thumbnails/21.jpg)
Questions?
![Page 22: Ari Rabkin and Randy Katz UC Berkeley USENIX LISA 2010 ... · Leveraging Hadoop • Really want to use HDFS for storage and MapReduce for processing. ... – Supports legacy systems](https://reader033.vdocuments.net/reader033/viewer/2022050102/5f40f472a2aca128f43cf69d/html5/thumbnails/22.jpg)
…vs Splunk
• Significant overlap with Splunk. – Splunk uses syslog for transport. – Recently shifted towards MapReduce for
evaluation. • Chukwa on its own doesn’t [yet] do
indexing or analysis. • Chukwa helps extract data from systems
– Reliably – Customizably
![Page 23: Ari Rabkin and Randy Katz UC Berkeley USENIX LISA 2010 ... · Leveraging Hadoop • Really want to use HDFS for storage and MapReduce for processing. ... – Supports legacy systems](https://reader033.vdocuments.net/reader033/viewer/2022050102/5f40f472a2aca128f43cf69d/html5/thumbnails/23.jpg)
Assumptions about App
• Processing should happen off-node. (Production hosts are sacrosanct)
• Data should be available within minutes – Sub-minute delivery a non-goal.
• Data rates between 1 and 100KB/sec/node – Architecture tuned for these cases, but Chukwa
could be adapted to handle lower/higher rates. • No assumptions about data format • Administrator or app needs to tell Chukwa
where logs live. – Support for directly streaming logs as well.
![Page 24: Ari Rabkin and Randy Katz UC Berkeley USENIX LISA 2010 ... · Leveraging Hadoop • Really want to use HDFS for storage and MapReduce for processing. ... – Supports legacy systems](https://reader033.vdocuments.net/reader033/viewer/2022050102/5f40f472a2aca128f43cf69d/html5/thumbnails/24.jpg)
On the back end
• Chukwa has a notion of parsed records, with complex schemas – Can put into structured storage – Display with HICC, a portal-style web interface.
![Page 25: Ari Rabkin and Randy Katz UC Berkeley USENIX LISA 2010 ... · Leveraging Hadoop • Really want to use HDFS for storage and MapReduce for processing. ... – Supports legacy systems](https://reader033.vdocuments.net/reader033/viewer/2022050102/5f40f472a2aca128f43cf69d/html5/thumbnails/25.jpg)
![Page 26: Ari Rabkin and Randy Katz UC Berkeley USENIX LISA 2010 ... · Leveraging Hadoop • Really want to use HDFS for storage and MapReduce for processing. ... – Supports legacy systems](https://reader033.vdocuments.net/reader033/viewer/2022050102/5f40f472a2aca128f43cf69d/html5/thumbnails/26.jpg)
Not storage, not processing
• Chukwa is a collection system. – Not responsible for storage:
• Use HDFS. • Our model is store-everything, prune late
– Not responsible for processing • Use MapReduce, or custom layer on HDFS
• Responsible for facilitating storage and processing
• Framework for processing collected data • Includes Pig support
![Page 27: Ari Rabkin and Randy Katz UC Berkeley USENIX LISA 2010 ... · Leveraging Hadoop • Really want to use HDFS for storage and MapReduce for processing. ... – Supports legacy systems](https://reader033.vdocuments.net/reader033/viewer/2022050102/5f40f472a2aca128f43cf69d/html5/thumbnails/27.jpg)
Goal: Low Footprint
• Wanted minimal footprint on system and minimal changes to user workflow. – Application logging need not change. – Local logs stay put, Chukwa just copies them. – Can either specify filenames in static config, or
else do some dynamic discovery. • Minimal human-produced metadata
– We track what data source + host a chunk came from. Can store additional tags.
– Chunks are numbered; can reconstruct order. – No schemas required to collect data
![Page 28: Ari Rabkin and Randy Katz UC Berkeley USENIX LISA 2010 ... · Leveraging Hadoop • Really want to use HDFS for storage and MapReduce for processing. ... – Supports legacy systems](https://reader033.vdocuments.net/reader033/viewer/2022050102/5f40f472a2aca128f43cf69d/html5/thumbnails/28.jpg)
MapReduce and Hadoop
• Major motivation for Chukwa was storing and analyzing Hadoop logs. – At Yahoo!, common to dynamically allocate
hundreds of nodes for a particular task. – This can generate MBs of logs per second. – Log analysis becomes difficult
![Page 29: Ari Rabkin and Randy Katz UC Berkeley USENIX LISA 2010 ... · Leveraging Hadoop • Really want to use HDFS for storage and MapReduce for processing. ... – Supports legacy systems](https://reader033.vdocuments.net/reader033/viewer/2022050102/5f40f472a2aca128f43cf69d/html5/thumbnails/29.jpg)
Why Ganglia doesn’t do this
• Many systems for metrics collection – Ganglia particularly well-known. – Many similar systems, including network
management systems like OpenView – Focus on collecting and aggregating metrics in
scalable low-cost way • But logs aren’t metrics. Want to archive
everything, not summarize aggressively. • Really want reliable delivery; missing key
parts of logs might make rest useless
![Page 30: Ari Rabkin and Randy Katz UC Berkeley USENIX LISA 2010 ... · Leveraging Hadoop • Really want to use HDFS for storage and MapReduce for processing. ... – Supports legacy systems](https://reader033.vdocuments.net/reader033/viewer/2022050102/5f40f472a2aca128f43cf69d/html5/thumbnails/30.jpg)
Clouds
• Log processing needs to be scalable, since apps can get big quickly
• This used to be a problem for the Microsofts and Googles of the world. Now it affects many more.
• Can’t rely on local storage – Nodes are ephemeral – Need to move logs off-node
• Can’t do analysis on single host – The data is too big
![Page 31: Ari Rabkin and Randy Katz UC Berkeley USENIX LISA 2010 ... · Leveraging Hadoop • Really want to use HDFS for storage and MapReduce for processing. ... – Supports legacy systems](https://reader033.vdocuments.net/reader033/viewer/2022050102/5f40f472a2aca128f43cf69d/html5/thumbnails/31.jpg)
Questions about Goals
• How many nodes? How much data?
• What data sources and delivery semantics?
• Processing expressiveness?
• Storage?
![Page 32: Ari Rabkin and Randy Katz UC Berkeley USENIX LISA 2010 ... · Leveraging Hadoop • Really want to use HDFS for storage and MapReduce for processing. ... – Supports legacy systems](https://reader033.vdocuments.net/reader033/viewer/2022050102/5f40f472a2aca128f43cf69d/html5/thumbnails/32.jpg)
Chukwa goals
• How many nodes? How much data? – Scale to thousands of nodes. Hundreds of KB/
sec/node on average, bursts above that OK • What data sources and delivery semantics?
– Console Logs and Metrics. Reliable delivery (as much as possible.) Minutes of delay are OK.
• Processing expressiveness? – MapReduce
• Storage? – Should be able to store data indefinitely.
Support petabytes of stored data.
![Page 33: Ari Rabkin and Randy Katz UC Berkeley USENIX LISA 2010 ... · Leveraging Hadoop • Really want to use HDFS for storage and MapReduce for processing. ... – Supports legacy systems](https://reader033.vdocuments.net/reader033/viewer/2022050102/5f40f472a2aca128f43cf69d/html5/thumbnails/33.jpg)
In contrast
• Ganglia, Network Management systems, and Amazon’s CloudWatch are all metrics-oriented. – Goal is collecting and disseminating numerical
metrics data in a scalable way. • Significantly different problem.
– Metrics have well defined semantics – Can tolerate data loss – Easy to aggregate/compress for archiving – Often time-critical
• Chukwa can serve these purposes, but isn’t optimized for it.
![Page 34: Ari Rabkin and Randy Katz UC Berkeley USENIX LISA 2010 ... · Leveraging Hadoop • Really want to use HDFS for storage and MapReduce for processing. ... – Supports legacy systems](https://reader033.vdocuments.net/reader033/viewer/2022050102/5f40f472a2aca128f43cf69d/html5/thumbnails/34.jpg)
Real-time Chukwa
• Chukwa was originally designed to support batch processing of logs – Minutes of latency OK.
• But we can do [best effort] real-time “for free” – Watch data go past at the collector – Check chunks against a search pattern, forward
matching ones to a listener via TCP. – Don’t need long-term storage or reliable delivery
(do those via the regular data path) • Director uses this real-time path.
![Page 35: Ari Rabkin and Randy Katz UC Berkeley USENIX LISA 2010 ... · Leveraging Hadoop • Really want to use HDFS for storage and MapReduce for processing. ... – Supports legacy systems](https://reader033.vdocuments.net/reader033/viewer/2022050102/5f40f472a2aca128f43cf69d/html5/thumbnails/35.jpg)
Related work summary
• Ganglia (and traditional NMS) don’t do large data volumes or data rates
• Facebook’s Scribe+Hive – Scribe is streaming, not batch – Hive is batch, and atop Hadoop – Doesn't do collection or visualization. – Doesn’t have strong reliability properties
• Flume (from Cloudera) – Very similar to Chukwa – Emphasis on centralized management