flume office-hours-110228

20

Upload: cloudera-inc

Post on 20-Aug-2015

2.609 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Flume office-hours-110228
Page 2: Flume office-hours-110228

Flume Office HoursCommunity planning

Jonathan HsiehCloudera HQ, 2/28/2011

Page 3: Flume office-hours-110228

Flume Office Hours, 2/28/2011 3

Outline

• State of the world• What’s new?• Stories (Chime in!)• What needs work?• Prioritizing what is next.• Q+A

Page 4: Flume office-hours-110228

Flume Office Hours, 2/28/2011 4

STATE OF THE WORLD

Page 5: Flume office-hours-110228

Flume Office Hours, 2/28/2011 5

Growing user and developer community

• Github stats:– Currently 295 watchers, 51 forks

• New Committers: – 9/10: Eric Sammer (Cloudera)– 1/11: Bruce Mitchener (Independent)

• User characteristics– Most potential users seem to use adhoc

scripts– Most users are early adopters / startup

devops

May-10 Jun-10 Aug-10 Sep-10 Nov-10 Jan-11 Feb-110

50

100

150

200

250

300

350

0

10

20

30

40

50

60

Watchers

Forks

Page 6: Flume office-hours-110228

Flume Office Hours, 2/28/2011 6

A short feature history

• 6/10: v0.9.0 – Initial open source release

• 8/10: v0.9.1 – Fixes for hangs – Initial compression features

• 10/10: v0.9.1+29 (CDH3b3, packages)– Added kerberized HDFS support– Flume cookbook– Elastic Search / Cassandra Plugins– Initial Voldemort Plugins

• 11/10: v0.9.2– Support for other compression codecs– Avro RPC– Improvements to tail and exec– Robustness improvements– Initial Hbase / MongoDB Plugin

• 2/11: v0.9.3 (CDH3b4, packages)– Flume Node Windows support– Initial JSON metrics support– Multi-master functional– Robustness improvements– JRuby / AMQP Plugins– S3/EC2 Blog Stories

• 4/11: v0.9.3+xxx (CDH3 Stable, packages)– Excessive Duplication fixes– Compression fixes

• ?/11: v0.9.4

Page 7: Flume office-hours-110228

Flume Office Hours, 2/28/2011 7

WHATS NEW?

Page 8: Flume office-hours-110228

Flume Office Hours, 2/28/2011 8

New features

• Flume node JSON metrics– http://node:35862/node/reports

• Terser syntax{ deco1 => { deco2 => sink } } deco1 deco2 sink

• Multiple collector sink supportcollector(30000) { [ escapedCustomDfs(“hdfs://nn1/path”,”prefix”,”format”), escapedCustomDfs(“hdfs://nn2/path”,”prefix”,”format”),

] }

• Limited Multi-master support• Windows support

Page 9: Flume office-hours-110228

Flume Office Hours, 2/28/2011 9

STORIES

Page 10: Flume office-hours-110228

Flume Office Hours, 2/28/2011 10

Flume

: The Standard Use Case

HDFS

AgentAgentAgentAgent

AgentAgentAgentAgent

AgentAgentAgentAgent

Collector

Collector

Collector

Masterserverserverserverserver

serverserverserverserver

serverserverserverserverAgent tier Collector tier

Page 11: Flume office-hours-110228

Flume Office Hours, 2/28/2011 11

: Multi Datacenter

HDFS

API se

rver

Collector tier

Pro

cess

or

serv

er

AgentAgentAgentAgentAgentAgentAgentAgentAgentAgentAgentAgent

AgentAgentAgentAgentAgentAgentAgentAgentAgentAgentAgentAgent

Collector

Collector

Collector

Collector

Collector

Collector

apiapiapiapiapiapiapiapiapiapiapiapi

apiapiapiproc

apiapiapiproc

apiapiapiproc

Page 12: Flume office-hours-110228

Flume Office Hours, 2/28/2011 12

: Multi Datacenter

HDFS

API se

rver

Collector tier

Pro

cess

or

serv

er

AgentAgentAgentAgentAgentAgentAgentAgentAgentAgentAgentAgent

AgentAgentAgentAgentAgentAgentAgentAgentAgentAgentAgentAgent

Collector

Collector

Collector

Collector

Collector

Collector

Relay

apiapiapiapiapiapiapiapiapiapiapiapi

apiapiapiproc

apiapiapiproc

apiapiapiproc

Page 13: Flume office-hours-110228

Flume Office Hours, 2/28/2011 13

Flume

: Near Realtime Aggregator

HDFS

DB Hive job

CollectorTracker AgentAgentAgentAgentAd svrAd svrAd svrAd svr

reports

verify

quickreports

Page 14: Flume office-hours-110228

Flume Office Hours, 2/28/2011 14

Flume

An enterprise storyA

PI se

rver

Collector tierAgentAgentAgentWinAgentAgentAgentLinuxAgentAgentAgentLinux

Collector

Collector

Collector

apiapiapiapiapiapiapiapiapiapiapiapi

Kerberos HDFS

D D DDDD

Active Directory / LDAP

Page 15: Flume office-hours-110228

Flume Office Hours, 2/28/2011 15

index

hbase

hdfs

An emerging community story

HDFSHive queryAgentAgentAgentAgentsvr

Collector Fanout HBase

Incremental Search Idx

Key lookup

Range query

Search query

Faceted query

Pig query

Flume

Page 16: Flume office-hours-110228

Flume Office Hours, 2/28/2011 16

WHAT NEEDS WORK?WHAT COMES NEXT?

Page 17: Flume office-hours-110228

Flume Office Hours, 2/28/2011 17

Known issues

• Excessive event duplication (due to tail or e2e agent)• Configuration translation problem in some cases• Multi-master limited: doesn’t work with translations

Page 18: Flume office-hours-110228

Flume Office Hours, 2/28/2011 18

What’s next? (proposals)

• Fix Excessive duplication issues.• Apache Incubator (?)• Log4j/Log4net/logback/etc…• Fix Multi-master limitations.• Security upgrades for node to node

comms (TLS/SSL)• Improved metrics / GUI / usability• Integration with open source

alerting/monitoring tools• Integration with proprietary systems

• Version proofing RPCs / State storage

• Packaging friendly plug-in install• Multi Datacenter Story• Performance Increases• Inline near-realtime analytics• Puppet/Chef style config for nodes• Lightweight Agent• Masterless Agent• Better S3 / AWS support

Page 19: Flume office-hours-110228

Flume Office Hours, 2/28/2011 19

Q+A

Page 20: Flume office-hours-110228