alerting with mysql and nagios sheeri cabral senior db admin/architect mozilla @sheeri

64
Alerting With MySQL and Nagios http://bit.ly/nagios_mysql2013 Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

Upload: lindsay-stone

Post on 03-Jan-2016

223 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

Alerting With MySQL and Nagios

http://bit.ly/nagios_mysql2013

Sheeri CabralSenior DB Admin/Architect

Mozilla@sheeri

Page 2: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

What is Monitoring?

Threshold alerting

Graphing/trending

Page 3: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

Why Monitor?

Problem alerting

Find patterns

– capacity planning

– troubleshooting

Early warning for potential issues

Page 4: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

What to Alert?

Problems you can fix

– max_connections

– long running queries

– locked queries

– backup disk space

Page 5: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

Nagios is great because...

...anyone can write a plugin

Page 6: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

The problem with Nagios...

...anyone can write a plugin

Page 7: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

Official Nagios Pluginsfor MySQL

check_mysql

check_mysql_query

Page 8: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

check_mysql

db connectivity slave running slave lag using seconds_behind_master

Page 9: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

check_mysql_query

Checks the output of a query is within a certain range (numerical)

Page 10: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

System vars Status vars Caching Calculations

check_mysql yes yes no no

check_mysql (standard) no yes no no

check_mysqld_status no yes no no

check_mysql_stats yes no yes no

check_mysqld no many no no

check_mysql_health yes yes no Hard-coded

check_mysql yes yes no Hard-coded

Third party plugins

Page 11: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

“Let us know what you'd like to see!”

Page 12: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

What I wanted

System vars Status vars Caching Calculations

mysql_health_check.plyes yes yes flexible

Page 13: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

mysql_health_check.pl

Page 14: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

Caching

Save information to a file

Page 15: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

Caching

Save information to a file --cache-dir /path/to/dir/

Page 16: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

Caching

Save information to a file --cache-dir /path/to/dir/

Use the file instead of connecting again

Page 17: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

Caching

Save information to a file --cache-dir /path/to/dir/

Use the file instead of connecting again

--max-cache-age <seconds>

Page 18: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

Caching

Save information to a file --cache-dir /path/to/dir/

Use the file instead of connecting again

--max-cache-age <seconds>

--no-cache to force connection

Page 19: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

--mode=varcomp

%metadata{varstatus}

Page 20: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

--mode=varcomp

%metadata{varstatus} SHOW GLOBAL VARIABLES SHOW GLOBAL STATUS

Page 21: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

--mode=varcomp

%metadata{varstatus} SHOW GLOBAL VARIABLES SHOW GLOBAL STATUS

--expression allows word replacement

Page 22: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

--mode=varcomp

%metadata{varstatus} SHOW GLOBAL VARIABLES SHOW GLOBAL STATUS

--expression allows word replacement

--warning --critical are flexible

Page 23: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

Sample Command Definitiondefine command {

command_name check_mysql_tmp_tables

command_line $USER1$/mysql_health_check.pl

--hostname $HOSTADDRESS$ --user myuser --password mypass

Page 24: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

Sample Command Definitiondefine command {

command_name check_mysql_tmp_tables

command_line $USER1$/mysql_health_check.pl

--hostname $HOSTADDRESS$ --user myuser --password mypass

--cache-dir=/var/lib/nagios/mysql_cache

--max-cache-age=300

Page 25: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

Sample Command Definitiondefine command {

command_name check_mysql_cxns

command_line $USER1$/mysql_health_check.pl

--hostname $HOSTADDRESS$ --user myuser --password mypass

--cache-dir=/var/lib/nagios/mysql_cache

--max-cache-age=300

--mode=varcomp

--expression=

"Max_used_connections/max_connections * 100"

--warning=">80" –critical=">90"

}

Page 26: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

Sample Command Definition

command_name check_mysql_cxns

--mode=varcomp

--expression=

"Max_used_connections/max_connections * 100"

--warning=">80" –critical=">90"

}

Page 27: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

Sample Service Definition

define service { use generic-service host_name __HOSTNAME__ service_description MySQL Connections check_command check_mysql_cxns }

Page 28: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

Rates Compare to last run

Page 29: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

Rates Compare to last run

mode=lastrun-varcomp

Page 30: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

Rates Compare to last run

mode=lastrun-varcomp

current{expr}

Page 31: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

Rates Compare to last run

mode=lastrun-varcomp

current{expr}

lastrun{expr}

Page 32: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

Rates Compare to last run

mode=lastrun-varcomp

current{expr}

lastrun{expr}

--comparison, no warn/crit

Page 33: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

Query Rate

mysql_health_check.pl [host,user,pass]

Page 34: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

mysql_health_check.pl [host,user,pass]

--mode lastrun-varcomp

Query Rate

Page 35: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

mysql_health_check.pl [host,user,pass]

--mode lastrun-varcomp

--expression "(current{Queries} - lastrun{Queries})

Query Rate

Page 36: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

mysql_health_check.pl [host,user,pass]

--mode lastrun-varcomp

--expression "(current{Queries} - lastrun{Queries})

/ (current{Uptime} – lastrun{Uptime})"

Query Rate

Page 37: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

mysql_health_check.pl [host,user,pass]

--mode lastrun-varcomp

--expression "abs((current{Queries} - lastrun{Queries})

/ (current{Uptime} – lastrun{Uptime}))*100"

--comparison ">80"

Query Rate

Page 38: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

define command {

command_name check_mysql_query_rate

command_line $USER1$/mysql_health_check.pl

--hostname $HOSTADDRESS$ --user myuser --password mypass

--cache-dir=/var/lib/nagios/mysql_cache

--max-cache-age=300

--mode=lastrun-varcomp

--expression=

"abs((current{Queries} - lastrun{Queries}) / (current{Uptime} – lastrun{Uptime}))*100"

--comparison > 100

}

Page 39: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

Other Modes

--mode=long-query --mode=locked-query

%metadata{proc_list} SHOW FULL PROCESSLIST

Page 40: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

Sample Command Definition

define command {

command_name check_mysql_locked_queries

command_line $USER1$/mysql_health_check.pl

--hostname $HOSTADDRESS$ --user myuser --password mypass

--cache-dir=/var/lib/nagios/mysql_cache

--max-cache-age=300

--mode=locked-query

--warning=$ARG1$ --critical=$ARG2$

}

Page 41: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

Extending Information

sub fetch_server_meta_data {}

add a new hash key to %metadata

$metadata{varstatus} =

$dbh->selectall_arrayref(

q|SHOW GLOBAL VARIABLES|);

Page 42: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

Extending Information

sub fetch_server_meta_data {}

add a new hash key to %metadata

$metadata{innodb_status} =

$dbh->selectall_arrayref(

q|SHOW ENGINE INNODB STATUS|);

Page 43: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

For example

• %metadata{innodb_status}

– SHOW ENGINE INNODB STATUS• Already exists, unused

Page 44: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

For example

• %metadata{innodb_status}

– SHOW ENGINE INNODB STATUS• Already exists, unused

• %metadata{master_status}

– SHOW MASTER STATUS

Page 45: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

For example

• %metadata{innodb_status}

– SHOW ENGINE INNODB STATUS• Already exists, unused

• %metadata{master_status}

– SHOW MASTER STATUS

• %metadata{slave_status}

– SHOW SLAVE STATUS

Page 46: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

“Standard” checks

% max connections

--expression 'Threads_connected/max_connections*100'

Page 47: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

“Standard” checks

% max connections

--expression 'Threads_connected/max_connections*100'

InnoDB enabled

--expression “have_innodb”

--comparison=“ne 'YES'”

Page 48: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

define command {

command_name check_mysql_connections

command_line $USER1$/mysql_health_check.pl --hostname $HOSTADDRESS$ --port 3306 --user <user> --password <password> --mode=varcomp --expression="Threads_connected/max_connections * 100" --comparison=">80"

}

define command {

command_name check_mysql_innodb

command_line $USER1$/mysql_health_check.pl --hostname $HOSTADDRESS$ --port 3306 --user <user> --password <password> --mode=varcomp --expression="have_innodb" --comparison="ne 'YES'"

}

Page 49: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

Did MySQL Crash?

Nagios set to check every 5 minutes

Page 50: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

Did MySQL Crash?

Nagios set to check every 5 minutes

Might miss a crash

Page 51: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

Did MySQL Crash?

Nagios set to check every 5 minutes

Might miss a crash

Uptime!

--expression 'Uptime'

--comparison=“<1800”

Page 52: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

Did MySQL Crash?

define command {

command_name check_mysql_uptime

command_line $USER1$/mysql_health_check.pl --hostname $HOSTADDRESS$ --port 3306 --user <user> --password <password> --mode=varcomp --expression="Uptime" --comparison="<1800"

}

Page 53: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

“Standard” checks

read_only for slaves

--expression “read_only”

--comparison=“ne 'YES'”

Page 54: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

“Standard” checks

read_only for slaves

--expression “read_only”

--comparison=“ne 'YES'”

% of sleeping connections

# connected, # running, # max connections

Page 55: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

“Standard” checks

read_only for slaves

--expression “read_only”

--comparison=“ne 'YES'”

% of sleeping connections

# connected, # running, # max connections

--expression="(Threads_connected-Threads_running)/max_connections * 100"

--comparison=">$ARG1$"

Page 56: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

define command {

command_name check_mysql_read_only

command_line $USER1$/mysql_health_check.pl --hostname $HOSTADDRESS$ --port 3306 --user <user> --password <password> --mode=varcomp --expression="read_only" --comparison="ne 'YES'"

}

define command {

command_name check_mysql_connections

command_line $USER1$/mysql_health_check.pl --hostname $HOSTADDRESS$ --port 3306 --user <user> --password <password> --mode=varcomp --expression="(Threads_connected-Threads_running)/max_connections * 100" --comparison=">$ARG1$"

}

Page 57: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

Limitations

One check/calculation per Nagios service

– But, you can use many variables

– Cached output Does not output for performance data

– Not hard to modify, just no need yet

Page 58: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

Where to get it

https://github.com/palominodb/PalominoDB-Public-Code-Repository/tree/master/nagios/

www.palominodb.com->Community->Projects

Page 59: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

Other Nagios Plugins

Page 60: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

Nagios Plugin for Partitions

table_partitions.pl --host

--user --pass

--database --table

--range [days|weeks|months]

--verify #

Page 61: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

Nagios Plugin for Partitions

table_partitions.pl --host db1.mozilla.com

--user nagiosuser --pass nagiospass

--database addons --table user_addons

--range months

--verify 3

Page 62: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

After using pt-table-checksum (master)

Slaves have a table with checksum

this_crc vs master_crc

Nagios Plugin for Checksums

Page 63: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

Nagios Plugin for Checksums

check_table_checksums.pl -H host

--user username --pass password

-T checksum_table

-I checksum_freshness

-b dbs,to,skip

Page 64: Alerting With MySQL and Nagios  Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri

More Resources

www.palominodb.com->Community->Projects

www.mysqlmarinate.com

[email protected]

OurSQL podcast (oursql.com)

slides: http://bit.ly/nagios_mysql2013

MySQL Administrator's Bible

youtube.com/tcation

http://planet.mysql.com