flume office-hours-110228

Post on 20-Aug-2015

2.612 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Flume Office HoursCommunity planning

Jonathan HsiehCloudera HQ, 2/28/2011

Flume Office Hours, 2/28/2011 3

Outline

• State of the world• What’s new?• Stories (Chime in!)• What needs work?• Prioritizing what is next.• Q+A

Flume Office Hours, 2/28/2011 4

STATE OF THE WORLD

Flume Office Hours, 2/28/2011 5

Growing user and developer community

• Github stats:– Currently 295 watchers, 51 forks

• New Committers: – 9/10: Eric Sammer (Cloudera)– 1/11: Bruce Mitchener (Independent)

• User characteristics– Most potential users seem to use adhoc

scripts– Most users are early adopters / startup

devops

May-10 Jun-10 Aug-10 Sep-10 Nov-10 Jan-11 Feb-110

50

100

150

200

250

300

350

0

10

20

30

40

50

60

Watchers

Forks

Flume Office Hours, 2/28/2011 6

A short feature history

• 6/10: v0.9.0 – Initial open source release

• 8/10: v0.9.1 – Fixes for hangs – Initial compression features

• 10/10: v0.9.1+29 (CDH3b3, packages)– Added kerberized HDFS support– Flume cookbook– Elastic Search / Cassandra Plugins– Initial Voldemort Plugins

• 11/10: v0.9.2– Support for other compression codecs– Avro RPC– Improvements to tail and exec– Robustness improvements– Initial Hbase / MongoDB Plugin

• 2/11: v0.9.3 (CDH3b4, packages)– Flume Node Windows support– Initial JSON metrics support– Multi-master functional– Robustness improvements– JRuby / AMQP Plugins– S3/EC2 Blog Stories

• 4/11: v0.9.3+xxx (CDH3 Stable, packages)– Excessive Duplication fixes– Compression fixes

• ?/11: v0.9.4

Flume Office Hours, 2/28/2011 7

WHATS NEW?

Flume Office Hours, 2/28/2011 8

New features

• Flume node JSON metrics– http://node:35862/node/reports

• Terser syntax{ deco1 => { deco2 => sink } } deco1 deco2 sink

• Multiple collector sink supportcollector(30000) { [ escapedCustomDfs(“hdfs://nn1/path”,”prefix”,”format”), escapedCustomDfs(“hdfs://nn2/path”,”prefix”,”format”),

] }

• Limited Multi-master support• Windows support

Flume Office Hours, 2/28/2011 9

STORIES

Flume Office Hours, 2/28/2011 10

Flume

: The Standard Use Case

HDFS

AgentAgentAgentAgent

AgentAgentAgentAgent

AgentAgentAgentAgent

Collector

Collector

Collector

Masterserverserverserverserver

serverserverserverserver

serverserverserverserverAgent tier Collector tier

Flume Office Hours, 2/28/2011 11

: Multi Datacenter

HDFS

API se

rver

Collector tier

Pro

cess

or

serv

er

AgentAgentAgentAgentAgentAgentAgentAgentAgentAgentAgentAgent

AgentAgentAgentAgentAgentAgentAgentAgentAgentAgentAgentAgent

Collector

Collector

Collector

Collector

Collector

Collector

apiapiapiapiapiapiapiapiapiapiapiapi

apiapiapiproc

apiapiapiproc

apiapiapiproc

Flume Office Hours, 2/28/2011 12

: Multi Datacenter

HDFS

API se

rver

Collector tier

Pro

cess

or

serv

er

AgentAgentAgentAgentAgentAgentAgentAgentAgentAgentAgentAgent

AgentAgentAgentAgentAgentAgentAgentAgentAgentAgentAgentAgent

Collector

Collector

Collector

Collector

Collector

Collector

Relay

apiapiapiapiapiapiapiapiapiapiapiapi

apiapiapiproc

apiapiapiproc

apiapiapiproc

Flume Office Hours, 2/28/2011 13

Flume

: Near Realtime Aggregator

HDFS

DB Hive job

CollectorTracker AgentAgentAgentAgentAd svrAd svrAd svrAd svr

reports

verify

quickreports

Flume Office Hours, 2/28/2011 14

Flume

An enterprise storyA

PI se

rver

Collector tierAgentAgentAgentWinAgentAgentAgentLinuxAgentAgentAgentLinux

Collector

Collector

Collector

apiapiapiapiapiapiapiapiapiapiapiapi

Kerberos HDFS

D D DDDD

Active Directory / LDAP

Flume Office Hours, 2/28/2011 15

index

hbase

hdfs

An emerging community story

HDFSHive queryAgentAgentAgentAgentsvr

Collector Fanout HBase

Incremental Search Idx

Key lookup

Range query

Search query

Faceted query

Pig query

Flume

Flume Office Hours, 2/28/2011 16

WHAT NEEDS WORK?WHAT COMES NEXT?

Flume Office Hours, 2/28/2011 17

Known issues

• Excessive event duplication (due to tail or e2e agent)• Configuration translation problem in some cases• Multi-master limited: doesn’t work with translations

Flume Office Hours, 2/28/2011 18

What’s next? (proposals)

• Fix Excessive duplication issues.• Apache Incubator (?)• Log4j/Log4net/logback/etc…• Fix Multi-master limitations.• Security upgrades for node to node

comms (TLS/SSL)• Improved metrics / GUI / usability• Integration with open source

alerting/monitoring tools• Integration with proprietary systems

• Version proofing RPCs / State storage

• Packaging friendly plug-in install• Multi Datacenter Story• Performance Increases• Inline near-realtime analytics• Puppet/Chef style config for nodes• Lightweight Agent• Masterless Agent• Better S3 / AWS support

Flume Office Hours, 2/28/2011 19

Q+A

top related