zfs monitoring and management at llnl

23
LLNL-PRES-724397 This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC ZFS Monitoring and Management at LLNL Tony Huer ZFS User Conference 2017 March 16, 2017

Upload: others

Post on 27-Jan-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ZFS Monitoring and Management at LLNL

LLNL-PRES-724397This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC

ZFS Monitoring and Management at LLNL

Tony Hutter

ZFS User Conference 2017

March 16, 2017

Page 2: ZFS Monitoring and Management at LLNL

LLNL-PRES-7243972

Contract awarded to RAID Inc

Lustre 2.8 on top of ZFS

Wanted a vendor agnostic software stack

16 MDS nodes, 36 OSS nodes

2880 8TB HDDs, 96 800GB SSDs

8 24-bay SSD JBODs, 36 84-bay HDD JBODs

Smaller configuration for “Brass” and “Jet” systems, but same RAID Inc. hardware.

Meet “Zinc” our new 18PB filesystem

Page 3: ZFS Monitoring and Management at LLNL

LLNL-PRES-7243973

Glamour shot

Page 4: ZFS Monitoring and Management at LLNL

LLNL-PRES-7243974

Node configuration

MDS Node 3

MDS Node 2

MDT enclosure L

MDS Node 1

MDT enclosure U

MDS Node 0

OST enclosure U

OSS Node 0

OST enclosure L

OSS Node 1

Metadata rack Object store rack

Page 5: ZFS Monitoring and Management at LLNL

LLNL-PRES-7243975

Multipath = each disk has two SAS connections

Increases bandwidth and provides link failover

Disk shows up as /dev/sda and /dev/sdb, and also multipath device /dev/dm-N

We use the ZFS 'vdev-id' script to make friendly aliases for the drives in /dev/disk/by-vdev/

Multipath drives

/dev/disk/by-vdev/L0 -> ../../dm-50/dev/disk/by-vdev/L1 -> ../../dm-81/dev/disk/by-vdev/L2 -> ../../dm-99.../dev/disk/by-vdev/L83 -> ../../dm-157

Page 6: ZFS Monitoring and Management at LLNL

LLNL-PRES-7243976

Splunk is a syslog processing engine with web front-end.

It has a query language that allows you to construct tables and graphs from syslog values.

You can group together multiple graphs and tables into a “dashboard”, for a single pane of glass view of your systems.

Just log key=value pairs to syslog and Splunk can graph it.

Monitoring with Splunk

Page 7: ZFS Monitoring and Management at LLNL

LLNL-PRES-7243977

Splunk example

Dec 6 14:00:41 jet21 zpool_status: pool=mypool, vdev=B7, state=FAULTED, read_errors=0, write_errors=3, chksum_errors=0, resilver=0

zpool_status host=* pool=*| where state!="ONLINE" OR read_errors!=0 OR write_errors!=0 OR chksum_errors!=0...

syslog

Splunk query

=

+

Page 8: ZFS Monitoring and Management at LLNL

LLNL-PRES-7243978

zpool status across all filesystems

Page 9: ZFS Monitoring and Management at LLNL

LLNL-PRES-7243979

Graphing zpool status over time

Page 10: ZFS Monitoring and Management at LLNL

LLNL-PRES-72439710

SMART stats (smartctl -a)

We're logging smart status, read & write uncorrectable errors, and Grown Defect List (GLIST). All drives report SAS stats.

Page 11: ZFS Monitoring and Management at LLNL

LLNL-PRES-72439711

SMART Gown Defect List (smartctl -a)

Not enough data to know if GLIST is a predictor of pending drive failure yet.

Page 12: ZFS Monitoring and Management at LLNL

LLNL-PRES-72439712

SMART Drive Temperatures (smartctl -a)

We've noticed on occasions that a few of our disks run hotter than spec.

Page 13: ZFS Monitoring and Management at LLNL

LLNL-PRES-72439713

SMART Drive Temperatures (smartctl -a)

Drives at the back of the enclosure get hotter, so we adjusted our raidz2 configuration to have mix of drives from front and back.

Page 14: ZFS Monitoring and Management at LLNL

LLNL-PRES-72439714

Enclosure sensor values (sg_ses)

We graph enclosure fan speed, temperature, voltage, and current.

Page 15: ZFS Monitoring and Management at LLNL

LLNL-PRES-72439715

Enclosure sensor values (sg_ses)

We graph the number of SES values reporting “Critical” or to look for potential hardware problems.

Page 16: ZFS Monitoring and Management at LLNL

LLNL-PRES-72439716

Enclosure sensor values (sg_ses)

Page 17: ZFS Monitoring and Management at LLNL

LLNL-PRES-72439717

SAS PHY Errors (/sys/class/sas_phy/...)

Bad SAS PHYs can create ZFS read/write errors, and cause drives to disappear and re-appear.

Page 18: ZFS Monitoring and Management at LLNL

LLNL-PRES-72439718

Disk history by drive serial number

Periodically logging drive serial numbers allow you to see if and when drives were replaced, and it helps locate the drive if it's been moved to another enclosure.

You can also build a record of any SMART errors associated with that serial number in case you need to RMA the drive.

Page 19: ZFS Monitoring and Management at LLNL

LLNL-PRES-72439719

zpool iostat bandwidth

Page 20: ZFS Monitoring and Management at LLNL

LLNL-PRES-72439720

zpool iostat latency

Page 21: ZFS Monitoring and Management at LLNL

LLNL-PRES-72439721

We log stats with cron every hour. Also log zpool stats on every vdev state change via a zedlet.

'zpool status -c' can be useful for grabbing stats:

Logging scripts

# zpool status -c 'smartctl -a $VDEV_UPATH | grep "Drive Temp"' ...

NAME STATE READ WRITE CKSUMjet18 DEGRADED 0 0 0 raidz2-0 ONLINE 0 0 0 L0 ONLINE 0 0 0 Current Drive Temperature: 26 C L1 ONLINE 0 0 0 Current Drive Temperature: 25 C L14 ONLINE 0 0 0 Current Drive Temperature: 33 C L15 ONLINE 0 0 0 Current Drive Temperature: 30 C L28 ONLINE 0 0 0 Current Drive Temperature: 37 C L29 ONLINE 0 0 0 Current Drive Temperature: 35 C L42 ONLINE 0 0 0 Current Drive Temperature: 40 C L43 ONLINE 0 0 0 Current Drive Temperature: 39 C L56 ONLINE 0 0 0 Current Drive Temperature: 43 C L70 ONLINE 0 0 0 Current Drive Temperature: 46 C

Page 22: ZFS Monitoring and Management at LLNL

LLNL-PRES-72439722

We use zed to automatically turn on/off slot LEDs when vdevs go FAULTED/DEGRADED/UNAVAIL.

We enable auto-replace on our pools so we can swap in a new disk for an old one and have it auto-resilver.

This allows operations staff to replace bad drives without being root.

Slot fault LEDs

Page 23: ZFS Monitoring and Management at LLNL