data-driven operations - practice realtime data analyse

27
Data-Driven Operations Practice realtime data analyse @khsing

Upload: guixing-bai

Post on 23-Jun-2015

207 views

Category:

Technology


1 download

DESCRIPTION

Grab data from any of logs and operations in realtime. Enable the power to find problem instantly. And make all operations based on data.

TRANSCRIPT

Page 1: Data-Driven Operations - Practice realtime data analyse

Data-Driven OperationsPractice realtime data analyse

@khsing

Page 2: Data-Driven Operations - Practice realtime data analyse

Who am I

• Currently, I am a operations architect in SINA.

• Focus on automation tools and devops method

Page 3: Data-Driven Operations - Practice realtime data analyse

What kind of data is for operations?

Page 4: Data-Driven Operations - Practice realtime data analyse

Before we talk data

Page 5: Data-Driven Operations - Practice realtime data analyse

How is one day of ops?

Page 6: Data-Driven Operations - Practice realtime data analyse

• Check the Dashboard and looks good.

• Start work, write scripts or configurations

• Suddenly, Receiving alert SMS/Email or problem reported by CS.

• Start work with event/problem/outage

Page 7: Data-Driven Operations - Practice realtime data analyse

You are the Fireman http://www.flickr.com/photos/40699207@N05/3838012090/

Page 8: Data-Driven Operations - Practice realtime data analyse

Find the problem

• take a look at Dashboard, Nagios, and monitor

• grep logs from hundreds of host.

• watch the network diagram

• guess what is going wrong

Page 9: Data-Driven Operations - Practice realtime data analyse

Driven by problem

Page 10: Data-Driven Operations - Practice realtime data analyse

Passive

Page 11: Data-Driven Operations - Practice realtime data analyse

Be Active

Page 12: Data-Driven Operations - Practice realtime data analyse

Let’s talk data

Page 13: Data-Driven Operations - Practice realtime data analyse

datas

• Logs

• Access log, error log, exception log, step log

• Configuration Change log, Release log

• Performance Measurement

• Product operations data.

Page 14: Data-Driven Operations - Practice realtime data analyse

Logs

• Success is useless.

• Error is useful.

Page 15: Data-Driven Operations - Practice realtime data analyse

Process logs

• Realtime or near realtime take big benefit

• You can’t waste 1 hour when problem really happen

• You have to feel problem before too many users blame.

Page 16: Data-Driven Operations - Practice realtime data analyse

Process Logs

• Automatically category.

Page 17: Data-Driven Operations - Practice realtime data analyse

Normal logs

Page 18: Data-Driven Operations - Practice realtime data analyse

Categorised logs

Page 19: Data-Driven Operations - Practice realtime data analyse

Performance Measurement

• How fast when end-user visit our website?

• Where are they come from?

• Which datacenter are they visited?

• What the slow/fast user ratio?

Page 20: Data-Driven Operations - Practice realtime data analyse

Product Operations Data

• like DAU

• Drop, Spike, Increase are event, need take action.

Page 21: Data-Driven Operations - Practice realtime data analyse

Change/Release log

• Many problem come with Change or Release

• You have to watch those data after you did a change or release.

• Change/Release log have to visible on dashboard.

Page 22: Data-Driven Operations - Practice realtime data analyse

Change/Release log

Page 23: Data-Driven Operations - Practice realtime data analyse

Be active

Page 24: Data-Driven Operations - Practice realtime data analyse

Don’t defensive

Page 25: Data-Driven Operations - Practice realtime data analyse

–Olbrich Desouza

Attack is the best form of defence

Page 26: Data-Driven Operations - Practice realtime data analyse

Tools

• Splunk - commercial

• Logstash, ElasticSearch, Kibana

• Graphite

• StatsD

Page 27: Data-Driven Operations - Practice realtime data analyse

Q&A