introduction ot flume

7

Upload: knowbigdata

Post on 15-Aug-2015

209 views

Category:

Education


0 download

TRANSCRIPT

www.KnowBigData.comHadoop

FLUME

A distributed, reliable, and available system for efficiently collecting, aggregating & moving large data from

many different sources to a centralized data store.

www.KnowBigData.comHadoop

Supports a large variety of sources Including:• tail (like unix tail -f), • syslog, • log4j - allowing java applications to write logs to HDFS via flume

Flume nodes can be arranged in arbitrary topologies. Typically there is a node running on each source machineWith tiers of aggregating nodes that the data flows through on its way to HDFS.

Delivery reliability:best-effort delivery - doesn’t tolerate any node failuresend-to-end - which guarantees delivery in node failures

FLUME

www.KnowBigData.comHadoop

Flume Example: Read the data at a port and push it HDFS

Step 0. flume.properties - Download (Also in sgiri/flume)

# Name the components on this agenta1.sources = r1a1.sinks = s1a1.channels = c1

# Describe/configure the sourcea1.sources.r1.type = netcata1.sources.r1.bind = localhosta1.sources.r1.port = 44444

# Describe the sink#a1.sinks.k1.type = loggera1.sinks.s1.type = hdfsa1.sinks.s1.hdfs.path = hdfs://hadoop1.knowbigdata.com/user/student/sgiri/flume/webdata

www.KnowBigData.comHadoop

Flume Example: Read the data at a port and push it HDFS

Step 0. flume.properties - Download (Also in sgiri/flume)

# Use a channel which buffers events in memorya1.channels.c1.type = memorya1.channels.c1.capacity = 1000a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channela1.sources.r1.channels = c1#a1.sinks.k1.channel = c1a1.sinks.s1.channel = c1

www.KnowBigData.comHadoop

Flume Example: Read the data at a port and push it HDFS

1. Start The Agent

flume-ng agent --conf conf --conf-file conf/flume.properties \

--name a1 Dflume.root.logger=INFO,console

2. Generate Some Data

Telnet localhost 44444

3. Check the HDFS

/user/student/sgiri/flume/webdata More Sinks