osmc 2014: processing millions of logs with logstash and integrating with elasticsearch, hadoop and...

34
About me Scope of this presentation Logstash: What? When? How? Logstash: Deploying fast and simple Logstash: Deploying Processing millions of logs with Logstash and integrating with Elasticsearch, Hadoop and Cassandra Valentin Fischer-Mitoiu, OSMC 2014 November 21, 2014 Valentin Fischer-Mitoiu, OSMC 2014 Processing millions of logs with Logstash

Upload: netways

Post on 02-Jul-2015

1.057 views

Category:

Software


6 download

DESCRIPTION

This presentation is meant to be an introduction in the world of log monitoring using Logstash and also a showcase for using Logstash in big data setups. You will be guided in using Logstash from scratch and end up with a complete and centralized setup that should cover all your logging needs. The presentation covers multiple software solutions like Logstash, Kibana, Elasticsearch, HDFS, Cassandra, HBase and KairosDB.

TRANSCRIPT

Page 1: OSMC 2014: Processing millions of logs with Logstash and integrating with Elasticsearch, Hadoop and Cassandra | Valentin Fischer-Mitoiu

About me Scope of this presentation Logstash: What? When? How? Logstash: Deploying fast and simple Logstash: Deploying fast and simple Big data integration Conclusion Questions The end

Processing millions of logs with Logstashand integrating with Elasticsearch, Hadoop and Cassandra

Valentin Fischer-Mitoiu, OSMC 2014

November 21, 2014

Valentin Fischer-Mitoiu, OSMC 2014

Processing millions of logs with Logstash

Page 2: OSMC 2014: Processing millions of logs with Logstash and integrating with Elasticsearch, Hadoop and Cassandra | Valentin Fischer-Mitoiu

About me Scope of this presentation Logstash: What? When? How? Logstash: Deploying fast and simple Logstash: Deploying fast and simple Big data integration Conclusion Questions The end

About me

My name is Valentin Fischer-Mitoiu and I work for theUniversity of Vienna.

More specificaly in a group called Domainis (Internet domainadministration).

We operate the nameservers, monitoring and much more fornic.at, the .at registry.

I’m a monitoring guy and sys admin, working daily with systemslike Logstash, Icinga, Elasticsearch, Hadoop, Cassandra.

Valentin Fischer-Mitoiu, OSMC 2014

Processing millions of logs with Logstash

Page 3: OSMC 2014: Processing millions of logs with Logstash and integrating with Elasticsearch, Hadoop and Cassandra | Valentin Fischer-Mitoiu

About me Scope of this presentation Logstash: What? When? How? Logstash: Deploying fast and simple Logstash: Deploying fast and simple Big data integration Conclusion Questions The end

Scope of this presentation

1. This presentation is meant as a walk-through for startingmonitoring logs using logstash.

2. Describing possible integration of logstash in ”big data”environments.

Valentin Fischer-Mitoiu, OSMC 2014

Processing millions of logs with Logstash

Page 4: OSMC 2014: Processing millions of logs with Logstash and integrating with Elasticsearch, Hadoop and Cassandra | Valentin Fischer-Mitoiu

About me Scope of this presentation Logstash: What? When? How? Logstash: Deploying fast and simple Logstash: Deploying fast and simple Big data integration Conclusion Questions The end

Logstash: What is it?

Logstash is a software that allows you to send, receive andmanage log files from various sources.

It appeared in 2011 and it is getting a lot of interest because of itsflexibility and power of configuration.

Based on Ruby, works as an agent/daemon and provides a highlevel of abstractization for its configuration, similar to puppet.

Valentin Fischer-Mitoiu, OSMC 2014

Processing millions of logs with Logstash

Page 5: OSMC 2014: Processing millions of logs with Logstash and integrating with Elasticsearch, Hadoop and Cassandra | Valentin Fischer-Mitoiu

About me Scope of this presentation Logstash: What? When? How? Logstash: Deploying fast and simple Logstash: Deploying fast and simple Big data integration Conclusion Questions The end

Logstash: How we use it

Logstash: How we use it

We started by testing it and see what types of logs we can managewith it. The initial setup was processing logs related to dns, login,postresql, soap, epp, and system.

We started using it in production and feeding logs from varioussources.

Valentin Fischer-Mitoiu, OSMC 2014

Processing millions of logs with Logstash

Page 6: OSMC 2014: Processing millions of logs with Logstash and integrating with Elasticsearch, Hadoop and Cassandra | Valentin Fischer-Mitoiu

About me Scope of this presentation Logstash: What? When? How? Logstash: Deploying fast and simple Logstash: Deploying fast and simple Big data integration Conclusion Questions The end

Logstash: Why we use it

Logstash: Why we use it

We use logstash because we want to use its power in managinglogs and generate events based on specific triggers.

We cover states with icinga and adding logstash would cover thewhole monitoring spectrum.

Valentin Fischer-Mitoiu, OSMC 2014

Processing millions of logs with Logstash

Page 7: OSMC 2014: Processing millions of logs with Logstash and integrating with Elasticsearch, Hadoop and Cassandra | Valentin Fischer-Mitoiu

About me Scope of this presentation Logstash: What? When? How? Logstash: Deploying fast and simple Logstash: Deploying fast and simple Big data integration Conclusion Questions The end

Logstash: Deploying fast and simple

Deploying logstash is quite simple, the challenge is in generatinga configuration.

In most cases it’s recommended to start with a centralized setup.

This means having a shipper, queue, indexer and some storagesystem.

Valentin Fischer-Mitoiu, OSMC 2014

Processing millions of logs with Logstash

Page 8: OSMC 2014: Processing millions of logs with Logstash and integrating with Elasticsearch, Hadoop and Cassandra | Valentin Fischer-Mitoiu

About me Scope of this presentation Logstash: What? When? How? Logstash: Deploying fast and simple Logstash: Deploying fast and simple Big data integration Conclusion Questions The end

Logstash: Overview of a centralized setup

Valentin Fischer-Mitoiu, OSMC 2014

Processing millions of logs with Logstash

Page 9: OSMC 2014: Processing millions of logs with Logstash and integrating with Elasticsearch, Hadoop and Cassandra | Valentin Fischer-Mitoiu

About me Scope of this presentation Logstash: What? When? How? Logstash: Deploying fast and simple Logstash: Deploying fast and simple Big data integration Conclusion Questions The end

Logstash: Hardware requirements

Logstash: Hardware requirements

In order find out how much hardware you need for your logstashsetup, first think how much data you process daily and if youneed to search into it.

Another factor is the complexity of your configuration, speciallyfilters.

Valentin Fischer-Mitoiu, OSMC 2014

Processing millions of logs with Logstash

Page 10: OSMC 2014: Processing millions of logs with Logstash and integrating with Elasticsearch, Hadoop and Cassandra | Valentin Fischer-Mitoiu

About me Scope of this presentation Logstash: What? When? How? Logstash: Deploying fast and simple Logstash: Deploying fast and simple Big data integration Conclusion Questions The end

Logstash: Working with inputs

Logstash: Working with inputs

This is how a file input looks like

1 i n pu t {f i l e {

3 codec => ” p l a i n ”path => ”/dns / l o g s / p r o c e s s i n g qu e u e /∗ . t a s k s ”

5 s t a r t p o s i t i o n => [ ” b eg i nn i n g ” ]type => ” p r o c e s s i n g t a s k s ”

7 }}

Valentin Fischer-Mitoiu, OSMC 2014

Processing millions of logs with Logstash

Page 11: OSMC 2014: Processing millions of logs with Logstash and integrating with Elasticsearch, Hadoop and Cassandra | Valentin Fischer-Mitoiu

About me Scope of this presentation Logstash: What? When? How? Logstash: Deploying fast and simple Logstash: Deploying fast and simple Big data integration Conclusion Questions The end

Logstash: Working with filters

Logstash: Working with filters

This is how a filter looks like

f i l t e r {2 grok {

add tag => [ ” p r o c e s s e d t a s k s ” ]4 match => [ ”message ” , ”%{IP : c l i e n t } %{WORD: method}

%{URIPATHPARAM: r e qu e s t } %{NUMBER: by t e s } %{NUMBER:du r a t i o n }” ]t a g o n f a i l u r e => [ ” f a i l e d t a k s ” ]

6 }}

Valentin Fischer-Mitoiu, OSMC 2014

Processing millions of logs with Logstash

Page 12: OSMC 2014: Processing millions of logs with Logstash and integrating with Elasticsearch, Hadoop and Cassandra | Valentin Fischer-Mitoiu

About me Scope of this presentation Logstash: What? When? How? Logstash: Deploying fast and simple Logstash: Deploying fast and simple Big data integration Conclusion Questions The end

Logstash: Working with outputs

Logstash: Working with outputs

This is how an output looks like

1 output {e l a s t i c s e a r c h {

3 c l u s t e r => ” s to rage−c l u s t e r ”codec => ” p l a i n ”

5 }}

Valentin Fischer-Mitoiu, OSMC 2014

Processing millions of logs with Logstash

Page 13: OSMC 2014: Processing millions of logs with Logstash and integrating with Elasticsearch, Hadoop and Cassandra | Valentin Fischer-Mitoiu

About me Scope of this presentation Logstash: What? When? How? Logstash: Deploying fast and simple Logstash: Deploying fast and simple Big data integration Conclusion Questions The end

Logstash: Working with Elasticsearch

Logstash: Working with Elasticsearch

Elasticsearch is distributed RESTful search and analyticsengine.

This gives logstash great searching power using the lucene syntax.

It is also a powerful storage system that can make use ofindexes, facets, templates and much more.

Valentin Fischer-Mitoiu, OSMC 2014

Processing millions of logs with Logstash

Page 14: OSMC 2014: Processing millions of logs with Logstash and integrating with Elasticsearch, Hadoop and Cassandra | Valentin Fischer-Mitoiu

About me Scope of this presentation Logstash: What? When? How? Logstash: Deploying fast and simple Logstash: Deploying fast and simple Big data integration Conclusion Questions The end

Logstash: Working with Elasticsearch

Logstash: Elasticsearch overview

Valentin Fischer-Mitoiu, OSMC 2014

Processing millions of logs with Logstash

Page 15: OSMC 2014: Processing millions of logs with Logstash and integrating with Elasticsearch, Hadoop and Cassandra | Valentin Fischer-Mitoiu

About me Scope of this presentation Logstash: What? When? How? Logstash: Deploying fast and simple Logstash: Deploying fast and simple Big data integration Conclusion Questions The end

Logstash: Working with Kibana

Logstash: Working with Kibana

Kibana is a web interface used to search (query) theelasticsearch cluster.

The interface can be grouped in dashboards, each of onecontaining different layouts.

Each layout can have: pies, maps, charts, histograms, tables,trends, etc.

Valentin Fischer-Mitoiu, OSMC 2014

Processing millions of logs with Logstash

Page 16: OSMC 2014: Processing millions of logs with Logstash and integrating with Elasticsearch, Hadoop and Cassandra | Valentin Fischer-Mitoiu

About me Scope of this presentation Logstash: What? When? How? Logstash: Deploying fast and simple Logstash: Deploying fast and simple Big data integration Conclusion Questions The end

Logstash: Working with Kibana

Logstash: Dashboard overview

Valentin Fischer-Mitoiu, OSMC 2014

Processing millions of logs with Logstash

Page 17: OSMC 2014: Processing millions of logs with Logstash and integrating with Elasticsearch, Hadoop and Cassandra | Valentin Fischer-Mitoiu

About me Scope of this presentation Logstash: What? When? How? Logstash: Deploying fast and simple Logstash: Deploying fast and simple Big data integration Conclusion Questions The end

Logstash: Deployment example

Logstash: Deployment example

You now know the basics, an actual deployment would look likethis.

1. Download a package from www.elasticsearch.org2. Setup your shipper3. Setup your queue4. Setup your indexer5. Setup your storage

Valentin Fischer-Mitoiu, OSMC 2014

Processing millions of logs with Logstash

Page 18: OSMC 2014: Processing millions of logs with Logstash and integrating with Elasticsearch, Hadoop and Cassandra | Valentin Fischer-Mitoiu

About me Scope of this presentation Logstash: What? When? How? Logstash: Deploying fast and simple Logstash: Deploying fast and simple Big data integration Conclusion Questions The end

Logstash: Setting up the shipper/forwarder

Logstash: Setting up your shipper/forwarder

A shipper/forwarder is an agent that usually contains some input(port, file, etc) that gets some data and sends it to an output (aqueue if using a centralized setup).

To setup your shipper you generally need an input and outputsection.

You can also do preprocessing in most shipping agents.

Valentin Fischer-Mitoiu, OSMC 2014

Processing millions of logs with Logstash

Page 19: OSMC 2014: Processing millions of logs with Logstash and integrating with Elasticsearch, Hadoop and Cassandra | Valentin Fischer-Mitoiu

About me Scope of this presentation Logstash: What? When? How? Logstash: Deploying fast and simple Logstash: Deploying fast and simple Big data integration Conclusion Questions The end

Logstash: Shipping logs

Logstash: Shipping logs

Log Courier ( lightweight, fast, secure, stable )

Logstash ( heavy, fast, stable, well developed )

Logstash-forwarder ( fast, secure, well developed )

Nxlog ( no java, lightweight, mainly used for windows logs )

Beaver ( lightweight, fast , python)

Valentin Fischer-Mitoiu, OSMC 2014

Processing millions of logs with Logstash

Page 20: OSMC 2014: Processing millions of logs with Logstash and integrating with Elasticsearch, Hadoop and Cassandra | Valentin Fischer-Mitoiu

About me Scope of this presentation Logstash: What? When? How? Logstash: Deploying fast and simple Logstash: Deploying fast and simple Big data integration Conclusion Questions The end

Logstash: Setting up the indexer

Logstash: Setting up your indexer

An indexer is a logstash agent that contains some logic (filters)and does some processing on messages.

To setup your indexer you generally need an input, filter andoutput section.

Valentin Fischer-Mitoiu, OSMC 2014

Processing millions of logs with Logstash

Page 21: OSMC 2014: Processing millions of logs with Logstash and integrating with Elasticsearch, Hadoop and Cassandra | Valentin Fischer-Mitoiu

About me Scope of this presentation Logstash: What? When? How? Logstash: Deploying fast and simple Logstash: Deploying fast and simple Big data integration Conclusion Questions The end

Logstash: Indexer example

Logstash: An example of a working indexer

An example of a running indexer.

Valentin Fischer-Mitoiu, OSMC 2014

Processing millions of logs with Logstash

Page 22: OSMC 2014: Processing millions of logs with Logstash and integrating with Elasticsearch, Hadoop and Cassandra | Valentin Fischer-Mitoiu

About me Scope of this presentation Logstash: What? When? How? Logstash: Deploying fast and simple Logstash: Deploying fast and simple Big data integration Conclusion Questions The end

Logstash: Setting up the queue

Logstash: Setting up your queue

In order to have a more robust and centralized setup we decided touse a queue.

This will keep all your messages and wait for indexers to processthem.

The most used queue with logstash is redis and rabbitmq.

Valentin Fischer-Mitoiu, OSMC 2014

Processing millions of logs with Logstash

Page 23: OSMC 2014: Processing millions of logs with Logstash and integrating with Elasticsearch, Hadoop and Cassandra | Valentin Fischer-Mitoiu

About me Scope of this presentation Logstash: What? When? How? Logstash: Deploying fast and simple Logstash: Deploying fast and simple Big data integration Conclusion Questions The end

Logstash: Setting up the storage cluster

Logstash: Setting up your storage cluster

In order to store all our logs, we will setup an elasticsearch cluster.We will use it also because of its advanced search capabilities.

Configuration is fairly simple. Edit elasticsearch.yml and makethe necessary changes.

After that, set the max RAM limit for your ES node(ES MAX MEM).

Valentin Fischer-Mitoiu, OSMC 2014

Processing millions of logs with Logstash

Page 24: OSMC 2014: Processing millions of logs with Logstash and integrating with Elasticsearch, Hadoop and Cassandra | Valentin Fischer-Mitoiu

About me Scope of this presentation Logstash: What? When? How? Logstash: Deploying fast and simple Logstash: Deploying fast and simple Big data integration Conclusion Questions The end

Logstash: Setting up the Kibana instance

Logstash: Setting up your storage cluster

Kibana is a tool used to visualize your data stored in Elasticsearch.

You can do complex queries using the lucene language and graphthe data in various charts, maps, etc.

e.g MESSAGES WHERE LOG SOURCE=(”shipper1”,”shipper2”)

Valentin Fischer-Mitoiu, OSMC 2014

Processing millions of logs with Logstash

Page 25: OSMC 2014: Processing millions of logs with Logstash and integrating with Elasticsearch, Hadoop and Cassandra | Valentin Fischer-Mitoiu

About me Scope of this presentation Logstash: What? When? How? Logstash: Deploying fast and simple Logstash: Deploying fast and simple Big data integration Conclusion Questions The end

Logstash: Debugging and errors

Logstash: Debugging and errors

Logstash is pretty clear and descriptive when it comes to errors butyou can always troubleshoot more using debug mode (-debug).

Another useful tip is sending your logs to stdout output and seemore information if you have issues with your messages format.

To see all the flags just use -h or –help.

Valentin Fischer-Mitoiu, OSMC 2014

Processing millions of logs with Logstash

Page 26: OSMC 2014: Processing millions of logs with Logstash and integrating with Elasticsearch, Hadoop and Cassandra | Valentin Fischer-Mitoiu

About me Scope of this presentation Logstash: What? When? How? Logstash: Deploying fast and simple Logstash: Deploying fast and simple Big data integration Conclusion Questions The end

Logstash: Getting help

Logstash: Getting help

Usually if you need help you can enter IRC and channel #logstashon network Freenode. You will find a large community and peoplewilling to help you.

I’m also present there with nickname vali.

You can also visit the official documentation present onhttp://logstash.net

Valentin Fischer-Mitoiu, OSMC 2014

Processing millions of logs with Logstash

Page 27: OSMC 2014: Processing millions of logs with Logstash and integrating with Elasticsearch, Hadoop and Cassandra | Valentin Fischer-Mitoiu

About me Scope of this presentation Logstash: What? When? How? Logstash: Deploying fast and simple Logstash: Deploying fast and simple Big data integration Conclusion Questions The end

Logstash: Developing your own add-ons

Logstash: Developing your own add-ons

To develop your own addon (input,filter,output) you need rubyknowledge.

You can find more information athttp://logstash.net/docs/1.4.2/extending/

Basically you follow a standard structure, add your methods andrequired arguments. Do some processing with the incoming event,modify it or return a value.

Valentin Fischer-Mitoiu, OSMC 2014

Processing millions of logs with Logstash

Page 28: OSMC 2014: Processing millions of logs with Logstash and integrating with Elasticsearch, Hadoop and Cassandra | Valentin Fischer-Mitoiu

About me Scope of this presentation Logstash: What? When? How? Logstash: Deploying fast and simple Logstash: Deploying fast and simple Big data integration Conclusion Questions The end

Logstash: Integrating with big data systems

In our case, integrating with big data systems was done viaKairosDB. This acts as a middleware and allows us to use asstorage system Hadoop or Cassandra.

In our case the sending was done via the opentsdb output.

And custom scripts sending directly to KairosDB via a TCP port.

Valentin Fischer-Mitoiu, OSMC 2014

Processing millions of logs with Logstash

Page 29: OSMC 2014: Processing millions of logs with Logstash and integrating with Elasticsearch, Hadoop and Cassandra | Valentin Fischer-Mitoiu

About me Scope of this presentation Logstash: What? When? How? Logstash: Deploying fast and simple Logstash: Deploying fast and simple Big data integration Conclusion Questions The end

Logstash: Using Hadoop

Logstash: Using Hadoop

The target is to develop a big data system that allows use to storeevents/data from various systems/sources forever.

The system has to be able to be queried using a simple interfaceand its data returned in a useful format (json) and be graphedeasily.

Valentin Fischer-Mitoiu, OSMC 2014

Processing millions of logs with Logstash

Page 30: OSMC 2014: Processing millions of logs with Logstash and integrating with Elasticsearch, Hadoop and Cassandra | Valentin Fischer-Mitoiu

About me Scope of this presentation Logstash: What? When? How? Logstash: Deploying fast and simple Logstash: Deploying fast and simple Big data integration Conclusion Questions The end

Logstash: Using Cassandra

Logstash: Using Cassandra

The same idea as using HBase + HDFS but exploring differentstorage systems.

We can still use Hive, Pig and MapReduce queries.

More simple to setup

Valentin Fischer-Mitoiu, OSMC 2014

Processing millions of logs with Logstash

Page 31: OSMC 2014: Processing millions of logs with Logstash and integrating with Elasticsearch, Hadoop and Cassandra | Valentin Fischer-Mitoiu

About me Scope of this presentation Logstash: What? When? How? Logstash: Deploying fast and simple Logstash: Deploying fast and simple Big data integration Conclusion Questions The end

Logstash: Using Elasticsearch

Logstash: Using Elasticsearch

Elasticsearch has the same characteristics as other NOSQLsystems.

Some advantages over casssandra and hadoop:- direct access from logstash, no need for any other middleware.- great web interface to allow easy access to the data andexcellent graphing options.- powerful API to retrieve data

Valentin Fischer-Mitoiu, OSMC 2014

Processing millions of logs with Logstash

Page 32: OSMC 2014: Processing millions of logs with Logstash and integrating with Elasticsearch, Hadoop and Cassandra | Valentin Fischer-Mitoiu

About me Scope of this presentation Logstash: What? When? How? Logstash: Deploying fast and simple Logstash: Deploying fast and simple Big data integration Conclusion Questions The end

Logstash: Conclusion

In our case logstash seems to be a great candidate for coveringthe log management spectrum.

Its ease of usage, great collection of plugins and easyconfiguration makes it a great tool for managing logs.

It is definitely a tool worth exploring for this kind of monitoring.

Valentin Fischer-Mitoiu, OSMC 2014

Processing millions of logs with Logstash

Page 33: OSMC 2014: Processing millions of logs with Logstash and integrating with Elasticsearch, Hadoop and Cassandra | Valentin Fischer-Mitoiu

About me Scope of this presentation Logstash: What? When? How? Logstash: Deploying fast and simple Logstash: Deploying fast and simple Big data integration Conclusion Questions The end

Questions

This would be a great time to ask questions. :)

Valentin Fischer-Mitoiu, OSMC 2014

Processing millions of logs with Logstash

Page 34: OSMC 2014: Processing millions of logs with Logstash and integrating with Elasticsearch, Hadoop and Cassandra | Valentin Fischer-Mitoiu

About me Scope of this presentation Logstash: What? When? How? Logstash: Deploying fast and simple Logstash: Deploying fast and simple Big data integration Conclusion Questions The end

The end

Thank you!

Valentin Fischer-Mitoiu, OSMC 2014

Processing millions of logs with Logstash