monitoring using open source technologies

60
Monitoring Using Open Source Technologies

Upload: utkarsh-bhatnagar

Post on 13-Apr-2017

132 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: Monitoring using Open source technologies

MonitoringUsing

Open Source Technologies

Page 2: Monitoring using Open source technologies

Utkarsh Bhatnagar

• Senior Software Engineer @ Sony Interactive Entertainment (PlayStation).• An active contributor to Grafana.• Project initiator for wizzy – a user friendly CLI tool for GRAFANA

GitHub - https://github.com/utkarshcmuEmail – [email protected]

GrafanaCon 2016 Speaker - https://www.youtube.com/watch?v=llRhdvV25rg

Page 3: Monitoring using Open source technologies

Monitoring using

@

Page 4: Monitoring using Open source technologies

PlayStation Outage!

Page 5: Monitoring using Open source technologies

Hi, I am Jack.

Sometime 2 years back…

Page 6: Monitoring using Open source technologies

POC on Monitoring

Requirements:

• 50,000 unique metrics from one source• Data points every minute• Roughly about 72 million data points per day• Data retention 60 days• User friendly UI with possible customization

Page 7: Monitoring using Open source technologies

Monitoring Stack

METRICSOURCE

Time Series Database Visualization Layer

Page 8: Monitoring using Open source technologies

Choosing the technology!

Page 9: Monitoring using Open source technologies

POCDesign & Architecture

METRICSOURCE

Page 10: Monitoring using Open source technologies

POC Completed!

Mission accomplished!

1 metrics source50,000 unique metrics

72 million data points per day

Page 11: Monitoring using Open source technologies

Metrics OnboardingTeam 1 Requirements:• 100,000 unique metrics• About 200 million data points per day

Team 2 Requirements:• 400,000 unique metrics• About 600 million data points per day

Team 3 Requirements:• 500,000 unique metrics• About 2 billion data points per day

Team 4 Requirements:• 800,000 unique metrics• About 5 billion data points per day

And more………

Page 12: Monitoring using Open source technologies

POCDesign & Architecture

METRICSOURCE

Page 13: Monitoring using Open source technologies

How to Scale?

Should he continue with Graphite?Should he ask to reduce metrics or datapoints?

How to dynamically scale Graphite?Does Grafana support other datasources?

OpenTSDB / InfluxDB / KairosDB / Prometheus?Support scaling Infrastructure to support variable load of metrics?

Challenges:• Multiple teams• Millions of unique metrics• Above 10 billion data points a day• Process 3 million logs every minute

and generate metrics• Reprocessing of metrics and logs if

needed• Provide real time monitoring for all

of the above using GRAFANA!

Page 14: Monitoring using Open source technologies

Strategy

Divide & Conquer

Team 1 Requirements:• 100,000 unique metrics• About 200 million data

points per day

Team 2 Requirements:• 500,000 unique metrics• About 2 billion data

points per day

Team 3 Requirements:• 3 million logs a minute• Generate metrics in real

time

And more………

Team 1 Requirements:• 100,000 unique metrics• About 200 million data

points per day

Page 15: Monitoring using Open source technologies

Design & Architecture

POCMETRICSOURCE

POC works for:

1 metrics source50,000 unique metrics

72 million data points per day

Team 1 requirements:

1 metrics source100,000 unique metrics

200 million data points per day

TEAM 1 METRIC SOURCE

Page 16: Monitoring using Open source technologies

Team 1 Conquered!

This strategy works! Bring it on!

Page 17: Monitoring using Open source technologies

Strategy

Divide & Conquer

Team 1 Requirements:• 100,000 unique metrics• About 200 million data

points per day

Team 2 Requirements:• 500,000 unique metrics• About 2 billion data

points per day

Team 3 Requirements:• 3 million logs a minute• Generate metrics in real

time

And more………

Team 2 Requirements:• 500,000 unique metrics• About 2 billion data

points per day

Page 18: Monitoring using Open source technologies

Design & Architecture

POCMETRICSOURCE

Team 2 requirements:

1 metrics source500,000 unique metrics

2 billion data points per day

TEAM 1 METRIC SOURCE

TEAM 2 METRIC SOURCE

Page 19: Monitoring using Open source technologies

Team 2 Conquered!

Page 20: Monitoring using Open source technologies

Design & Architecture

POCMETRICSOURCE

TEAM 1 METRIC SOURCE

TEAM 2 METRIC SOURCE

Team 2 requirements:

1 metrics source500,000 unique metrics

2 billion data points per day

Page 21: Monitoring using Open source technologies
Page 22: Monitoring using Open source technologies

Scaling Graphite

Clustering Graphite

CARBON RELAY

CARBON CACHE + WHISPER +

GRAPHITE WEB

CARBON CACHE + WHISPER +

GRAPHITE WEB

CARBON CACHE + WHISPER +

GRAPHITE WEB. . .

GRAPHITE WEB GRAPHITE WEB

LOAD BALANCER

Page 23: Monitoring using Open source technologies

Design & Architecture

POCMETRICSOURCE

TEAM 1 METRIC SOURCE

TEAM 2 METRIC SOURCE

Team 2 requirements:

1 metrics source500,000 unique metrics

2 billion data points per day

CR

G G G. . .

GW GW

LB

Page 24: Monitoring using Open source technologies

Team 2 Conquered!

But……. Happiness lasted only for a month

Page 25: Monitoring using Open source technologies

Design & Architecture

POCMETRICSOURCE

TEAM 1 METRIC SOURCE

TEAM 2 METRIC SOURCE

Team 2 requirements:

1 metrics source500,000 unique metrics

2 billion data points per day

CR

G G G. . .

GW GW

LB

Page 26: Monitoring using Open source technologies

Scalable Alternatives ToGraphite

Page 27: Monitoring using Open source technologies

Design & Architecture

POCMETRICSOURCE

TEAM 1 METRIC SOURCE

TEAM 2 METRIC SOURCE

Team 2 requirements:

1 metrics source500,000 unique metrics

2 billion data points per day

CR

G G G. . .

GW GW

LB

Page 28: Monitoring using Open source technologies

Team 2 Conquered!

Finally!

Page 29: Monitoring using Open source technologies

Strategy

Divide & Conquer

Team 1 Requirements:• 100,000 unique metrics• About 200 million data

points per day

Team 2 Requirements:• 500,000 unique metrics• About 2 billion data

points per day

Team 3 Requirements:• 3 million logs a minute• Generate metrics in real

time

And more………

Team 3 Requirements:• 3 million logs a minute• Generate metrics in real

time

Page 30: Monitoring using Open source technologies

How to process logs at scale?

Page 31: Monitoring using Open source technologies

Design & Architecture

POCMETRICSOURCE

TEAM 1 METRIC SOURCE

Team 3 requirements:

Over 5000 log sources3 million logs per minute

TEAM 2 METRIC SOURCE

LOGS SOURCES

Page 32: Monitoring using Open source technologies

Team 3 Conquered!

But …. One day..

Page 33: Monitoring using Open source technologies

Design & Architecture

POCMETRICSOURCE

TEAM 1 METRIC SOURCE

TEAM 2 METRIC SOURCE

LOGS SOURCES

Page 34: Monitoring using Open source technologies

Design & ArchitectureMETRIC SOURCE 1

METRIC SOURCE 2

METRIC SOURCE 3

METRIC SOURCE N

LOGS SOURCES

LB

Alerting

Page 35: Monitoring using Open source technologies

Metrics & Logs Sources

Graphite Stats- Apps using a stats library written byAlexander Filipchik

Custom metrics- From other sources

Page 36: Monitoring using Open source technologies

Lessons Learned

Page 37: Monitoring using Open source technologies

Strategy

Divide & Conquer

Page 38: Monitoring using Open source technologies
Page 39: Monitoring using Open source technologies

Look for alternatives!

Page 40: Monitoring using Open source technologies

Choose scalable components!

(Subject to effort and time)

Page 41: Monitoring using Open source technologies

Automation

Page 42: Monitoring using Open source technologies

Design & ArchitectureMETRIC SOURCE 1

METRIC SOURCE 2

METRIC SOURCE 3

METRIC SOURCE N

LOGS SOURCES

LB

Alerting

Page 43: Monitoring using Open source technologies

Some numbers• More than 3 million unique metrics supported

- creation and deletion happens all the time

• More than 11 billion data points written per day- across all TSDBs

• Processing about 40 billion events per day- logs and metrics events in near real time (within 30 seconds)

• More than 3000 requests per minute to Grafana dashboards- around 7000 requests in during outages

Page 44: Monitoring using Open source technologies

Monitoring Stack @ Sony PlayStation

METRIC SOURCE 1

METRIC SOURCE 2

METRIC SOURCE 3

METRIC SOURCE N

LOGS SOURCES

LB

Alerting

Page 45: Monitoring using Open source technologies

Grafana

A metrics visualization and alerting tool

Page 46: Monitoring using Open source technologies

Supports multipletime series databases

Page 47: Monitoring using Open source technologies

Supports multiple panel types

https://grafana.net/plugins

Page 48: Monitoring using Open source technologies

Supports multiplenotification channels for alerting

Page 49: Monitoring using Open source technologies

Other features……• Alert lists

• Drilldown links

• Template variables

• Dashboard snapshots

• Grafana.net community

• Grafana CLI

Page 50: Monitoring using Open source technologies

http://grafana.org/

http://docs.grafana.org/

https://github.com/grafana/grafana

https://raintank.slack.com

Grafana links!

Page 51: Monitoring using Open source technologies
Page 52: Monitoring using Open source technologies

• Move• Copy• Extract• Insert• Remove

• Rows• Panels• Template variables• Dashboard tags

Page 53: Monitoring using Open source technologies

• Dashboards• Datasources• Orgs• Rows• Panels• Template variables• Dashboard tags

Version Control

Page 54: Monitoring using Open source technologies

• Production• Staging• Testing• Development

Grafana in multiple environments

Page 55: Monitoring using Open source technologies

• Last 24 hours• By a dashboard tag• Customized dashboard list

Generate GIFs of important dashboards

Page 56: Monitoring using Open source technologies

Generate GIFs of important dashboards

Page 57: Monitoring using Open source technologies

• Upload/Store/Download dashboards to/in/from AWS S3 respectively.

• Search/Download community dashboards from Grafana.net

External features

Page 58: Monitoring using Open source technologies
Page 59: Monitoring using Open source technologies

https://utkarshcmu.github.io/wizzy-site/

https://utkarshcmu.github.io/wizzy-site/home/

https://github.com/utkarshcmu/wizzy

https://raintank.slack.com/messages/wizzy/

wizzy links!

Page 60: Monitoring using Open source technologies

Utkarsh Bhatnagar

• Senior Software Engineer @ Sony Interactive Entertainment (PlayStation).• An active contributor to Grafana.• Project initiator for wizzy – a user friendly CLI tool for GRAFANA

GitHub - https://github.com/utkarshcmuEmail – [email protected]

GrafanaCon 2016 Speaker - https://www.youtube.com/watch?v=llRhdvV25rg