Александр Махомет "beyond the code или как мониторить ваш...

30
Beyond the code. Keep your site healthy and users satisfied Aleksandr Makhomet Upwork https://www.facebook.com/amahomet http://twitter.com/amahomet

Upload: fwdays

Post on 22-Jan-2018

343 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"

Beyond the code. Keep your site healthy and users satisfiedAleksandr MakhometUpwork

https://www.facebook.com/amahomethttp://twitter.com/amahomet

Page 2: Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"

What is Upwork.com

• Formerly odesk.com

• Upwork has 12+ million registered freelancers and 5+ million registered clients. Three million jobs are posted annually, worth a total of $1+ billion USD, making it the world's largest freelancer marketplace.

• Highload (alexa=420). Microservice architecture

Page 3: Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"

What I’m talking about

User Experience is extremely important

Things that matter:

➔ Low errors level ➔ High performance➔ High site availability (no outages)

Page 4: Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"

Apdex

Apdex (Application Performance Index)

[0 - 1]

Page 5: Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"

Importance of DevOps culture

What breaks production? New Features!

Page 6: Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"

Importance of DevOps culture

DevOps (Developers + Operations)Is a culture that emphasizes the cooperation of both software developers and other information-technology (IT) professionals while automating the process of software delivery. It aims at establishing a culture and environment where building, testing, and releasing software can happen rapidly, frequently, and more reliably

Leads to: Faster time to market, lower failure rate of new releases, shortened lead time between fixes, and faster mean time to recovery

Page 7: Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"

Managing errors on production

Errare humanum est

Page 8: Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"

Managing errors on production

Effective logs are important

➔ Follow PSR-3

➔ Write as many logs as possible

➔ Write full logs (user id, visitor id, stack trace, request details, controller/action, instance info ...)

➔ Write Request Id➔ Use meaningful log messages➔ Do not write sensitive data

Page 9: Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"

Logging tools are important (ELK)

Effective tools are importantELK = Logstash + ElasticSearch + Kibana

➔ Logstash - collect, filter and store logs➔ ElasticSearch - powerful fulltext search on top of Apache Lucene➔ Kibana - UI for searching logs

Write logs in json format

Demo

Page 11: Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"

Error level

Monitor your error level

➔ Graphite➔ Google Analytics

Page 12: Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"

Performance

➔ Measure

➔ Group by controller/action or pageId

➔ Measure in detailsAny external service, Database, Memcache, Redis, whateverAny important component, like navigation

Page 13: Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"

Performance: StatisticMean can lie10 requests dataset, in ms (2, 3, 5, 6, 6, 7, 9, 9, 26, 37)Mean = (2+3+5+6+6+7+9+9+26+37) / 10 = 11ms

Median = (6+7) / 2 = 6.5 ms

90th percentile dataset (2, 3, 5, 6, 6, 7, 9, 9, 26)

Mean_90 = mean (90th percentile dataset) = 7.3Upper_90 = max (90th percentile dataset) = 26

Page 14: Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"

Performance: Graphite stack

Page 15: Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"

Performance: Graphite

Graphite collects, stores, and displays time-series data in real time. ➔ Carbon - a high-performance service that listens for time-series data➔ Whisper - a simple database library for storing time-series data➔ Graphite-web - Graphite's user interface & API for rendering graphs and

dashboards

Metric format:

Data retention:

<metric path> <metric value> <metric timestamp>

fwdays-demo.performance.pages.index 1 5098232342

retentions = 10:6h,60:14d,600:400d

Page 16: Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"

Performance: Graphite vs StatsD

With StatsD works betterStatsd is a forwarder to Graphite

➔ Non blocking UDP protocol➔ Aggregates data, high performance➔ Supports 4 useful metrics: Counting, Timers, Gauges, Sets

To integrate, build your own simple script or use any open source, most popular

Page 17: Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"

Performance: Graphite graphs

You may combine, modify and filter data to get graph that you need

Demo

Page 18: Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"

Grafana

Grafana is free, powerful and nice dashboards on top of Graphite

Demo

Page 19: Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"

Performance: prevent degradation

➔ Make performance degradation check as a part of your definition of done

➔ Add performance degradation check to your code review checklist

➔ Use load testing

Page 20: Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"

Performance: Alternatives

Google Analytics➔ Keeps history for a long time➔ Segments are great, get performance for different types of

users

New Relic➔ Powerful performance analytics from the box➔ Uses magic sometime➔ Has free light account with 1 day data retention

Demo

Page 21: Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"

Performance: ZipkinZipkin is a distributed tracing system. It helps gather timing data needed to troubleshoot latency problems in microservice architectures

Page 22: Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"

AlertingSetup simple healthcheck at least

Application metrics➔ 5xx / 4xx / 3xx / 2xx rate➔ Errors rate➔ Response time➔ Apdex

Server metrics➔ CPU Usage ➔ Load Average ➔ Memory Usage ➔ Disk space➔ Disk I/O ➔ Network I/O

Notification channels➔ Chat➔ Email➔ SMS/push➔ Phone call

Thresholds➔ Warning➔ Critical

Page 23: Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"

Alerting: Best Practices

➔ Avoid setting thresholds too low. Avoid false positive

➔ Adjust your conditions over time

Page 24: Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"

Alerting: Implementations

On top of Graphite➔ List of free tools (Cabot)

New Relic➔ Advanced in paid version➔ Basic in free version

Cloudwatch (if Amazon)

Zabbix / Nagios / Icinga

Page 25: Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"

Alerting: PagerDuty

PagerDuty is an alarm aggregation and dispatching service for system administrators and support teams. It collects alerts from your monitoring tools, gives you an overall view of all of your monitoring alarms, and alerts an on duty engineer if there’s a problem.

Demo

Page 26: Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"

IncidentsIncident is a critical violation

➔ Create an #incident channel➔ Define incident escalation policy

◆ Define a person who can make decisions◆ Define a duty officer◆ Enable Moratorium for production changes until resolved

➔ Track metrics◆ MMTR - Mean time to resolve◆ MMTD - Mean time to detect◆ MMTE - Mean time to escalate◆ MTBF - Mean time between failures

➔ Do Postmortems➔ Have a visibility on production changes, especially with

microservices

Page 27: Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"

Postmortems

During➔ Do not offend ➔ Do not feel offended

After➔ Create a document with answers and share it➔ File issues

Before➔ What other parts of the site might also have similar

issues?➔ How we can determine root cause faster? ➔ How can we prevent it in future.➔ Lessons learned

Page 29: Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"

Le Fin

Page 30: Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"

Questions?Aleksandr Makhomet

https://www.facebook.com/amahomethttp://twitter.com/amahomet

http://fwdays.comhttp://ergo.place

Upwork is hiring, if you are looking for an remote php senior dev position, ping me