monitoring in big data frameworks @ big data meetup, timisoara, 2015

18
DICE Horizon 2020 Project Grant Agreement no. 644869 http://www.dice-h2020.eu Funded by the Horizon 2020 Framework Programme of the European Union Monitoring in Big Data Frameworks Gabriel Iuhasz Institute e-Austria Timisoara 26 November 2015

Upload: institute-e-austria-timisoara

Post on 12-Feb-2017

156 views

Category:

Software


7 download

TRANSCRIPT

Page 1: Monitoring in Big Data Frameworks @ Big Data Meetup, Timisoara, 2015

DICE Horizon 2020 Project Grant Agreement no. 644869http://www.dice-h2020.eu Funded by the Horizon 2020

Framework Programme of the European Union

Monitoring in Big Data Frameworks

Gabriel IuhaszInstitute e-Austria Timisoara26 November 2015

Page 2: Monitoring in Big Data Frameworks @ Big Data Meetup, Timisoara, 2015

Overview

o Introductiono Cloud Computing and Big Datao Monitoring Toolso Monitoring Requirements and Solutionso Conclusions

Page 3: Monitoring in Big Data Frameworks @ Big Data Meetup, Timisoara, 2015

Introductiono Big Data in Cloud computing

o Volume, Velocity, Variety and Veracityo Cost Reduction, Rapid provisioning/time to market,

Flexibility/scalabilityo DevOps and Cloud

o Development and Operationso Communication, Collaboration, Integration,

Automationo DevOps Monitoring

o Measurement is a key aspect of DevOps

Page 4: Monitoring in Big Data Frameworks @ Big Data Meetup, Timisoara, 2015

Big Data in Cloud Computing

o Challenges of Big Data On Cloudo Low Latency real-time data

oVirtualization overheadoMulti-tenancy overhead

o Scalabilityo Lack of RDBMS support

o Availabilityo Data integrity/privacy

Page 5: Monitoring in Big Data Frameworks @ Big Data Meetup, Timisoara, 2015

Hadoop Ecosystem

Page 6: Monitoring in Big Data Frameworks @ Big Data Meetup, Timisoara, 2015

Cloudera

Page 7: Monitoring in Big Data Frameworks @ Big Data Meetup, Timisoara, 2015

HortonWorks

Page 8: Monitoring in Big Data Frameworks @ Big Data Meetup, Timisoara, 2015

Monitoring Architectureo Cross layer monitoring of big data platformso Types of metrics are highly dependent on the type of the

application o Have to be decided on a platform/application basis

o Centralized Monitoringo All resource states are sent to a centralized monitoring servero Metrics are continuously polled from monitored components o Single point of failureo Lacks scalability

o Decentralized Monitoringo No single point of failureo Central authority is diffused

Page 9: Monitoring in Big Data Frameworks @ Big Data Meetup, Timisoara, 2015

Toolso Hadoop Performance Monitoring UI

o Lightweight monitoring UI for Hadoop servero Uses Hadoop metrics (using Sinks)

o SequenceIQo Based on ELK stack and Docker containerso ElasticSearch can be easily scaled horizontallyo Logstash server on client side

o Gangliao Scalable distributed monitoring systemo Low per-node overheado Focused on System Metricso Gmond, gmetad and Web Front-end

Page 10: Monitoring in Big Data Frameworks @ Big Data Meetup, Timisoara, 2015

Tools IIo Apache Chukwa

o Built on top of HDFSo Easily scalableo Potentially high overhead

o Hadoop Vaidyao Rule Based diagnostic tool for M/R jobso Performes post run results analysis

o Nagioso Plugin based architectureo Uses a centralized server to collect metricso Possible to create a hierarchical deployment

Page 11: Monitoring in Big Data Frameworks @ Big Data Meetup, Timisoara, 2015

Requirementso Difficulties in cloud monitoring

o Scaleo Velocity or Timelinesso Constant changes

o The need for scalability and automationo Easy re-configurabilityo Lightweight metrics collectorso Identifying pertinent metrics

Page 12: Monitoring in Big Data Frameworks @ Big Data Meetup, Timisoara, 2015

DICE Overview

Platform-Indep. Model

Domain Models

ContinuousValidation

ContinuousMonitoring

DataAwareness

ArchitectureModel

Platform-Specific Model

PlatformDescription

DICE MARTE

Deployment &Continuous Integration

DICE IDE

Big Data

QAModels

Page 13: Monitoring in Big Data Frameworks @ Big Data Meetup, Timisoara, 2015

DICE Monitoring Platformo RESTful Web Service

o Used to deploy and configure all core/auxiliary componentso Used to query ElasticSearch

Exports metrics in: JSON, CSV, OSLC Perf. Mon 2.0 (RDF+XML)o Used for auto-scaling of monitoring solution

o ELK Stack o Extremely flexible/configurableo Horizontally scalableo Can except various input and output formatso ETL via Logstash server (filters) o Logstash-forwarder secure transmission (new Beats Data Shippers)o Visualization using Kibana4

o Collectd o Statistics collection daemono A lot of plugins available o Simple configuration

Page 14: Monitoring in Big Data Frameworks @ Big Data Meetup, Timisoara, 2015

DICE Monitoring Platform II

Page 15: Monitoring in Big Data Frameworks @ Big Data Meetup, Timisoara, 2015

DICE Monitoring Platform Scaled

Page 16: Monitoring in Big Data Frameworks @ Big Data Meetup, Timisoara, 2015

DICE Monitoring Platform Variant

Page 17: Monitoring in Big Data Frameworks @ Big Data Meetup, Timisoara, 2015

Conclusionso We have given a short overview of current monitoring

platforms Identified key requirements for Big Data Monitoringo Scaling, Autonomy, Timeliness o Automation via Chef recipes

o Presented the current Architecture of the DICE Monitoring Platformo Currently collecting from: HDFS, YARN, Spark, Storm, Kafkao In the near future: Cassandra possibly Trident

o Creating the full lambda architecture based anomaly detection platform o ElasticSearch used as serving layer

Page 18: Monitoring in Big Data Frameworks @ Big Data Meetup, Timisoara, 2015

Thank You!

Questions?