02_aggr_wafl_v2

26
Module 2. WAFL Inconsistencies WAFL Inconsistencies DOT 7.0 Update Aggregate Performance Student Guide

Upload: senthil

Post on 22-Feb-2015

352 views

Category:

Documents


8 download

TRANSCRIPT

Page 1: 02_Aggr_Wafl_v2

Module 2. WAFL Inconsistencies

WAFL Inconsistencies

DOT 7.0 Update Aggregate Performance

Student Guide

Page 2: 02_Aggr_Wafl_v2

Objectives

At the conclusion of this module, you will be able to:– Describe how to fix WAFL inconsistencies with

aggregates and flexible volumes– Execute WAFL_check and wafliron on flexible

volumes and aggregates

Student Guide

Page 3: 02_Aggr_Wafl_v2

Aggregates & Flexible Volumes

With Aggregates & Flexible Volumes– Data is actually stored in the aggregate– Therefore, file system inconsistencies actually live

in the aggregate– Aggregates can be inconsistent– WAFL_check and wafliron are aggregate

operations

Student Guide

Page 4: 02_Aggr_Wafl_v2

Tools for Aggregates

Filer offline– WAFL_check

• Will prompt to check all aggregates or (will also check all flexible volumes contained inside aggregate)

– WAFL_check <aggrname> (will check flexible aggregates contained in named aggregate)

Filer offline momentarily– wafliron (From 1-5 menu)

• Checks every volume. This includes aggregates with flexvols, traditional volumes, and even the root volume

– aggr wafliron (from command prompt) • Non-root aggregates

Utilized on traditional volumes as before• vol wafliron (from command prompt) • Can use aggr wafliron as well

Student Guide

Page 5: 02_Aggr_Wafl_v2

Running WAFL_check

Running WAFL_check will check flexible volumes in aggregate

Selection (1-5)? WAFL_check aggr1Checking aggr1...WAFL_check NetApp Release RanchorsteamN_040517_2215Starting at Wed May 19 00:43:27 GMT 2004Phase 1: Verify fsinfo blocks.Phase 2: Verify metadata indirect blocks....Phase 5: Check volumes.Phase 5a: Check volume inodesPhase 5a time in seconds: 0Phase 5b: Check volume contents

Student Guide

Page 6: 02_Aggr_Wafl_v2

Running WAFL_check

Checking volume vol1...Phase [5.1]: Verify fsinfo blocks.Phase [5.2]: Verify metadata

indirect blocks....Phase [5.6d]: Check blocks

used.Phase [5.6d] time in seconds: 0Phase [5.6] time in seconds: 0WAFL_check time in seconds: 7(No filesystem state changed.)Phase 5b time in seconds: 7

Phase 6: Clean up.Phase 6a: Find lost nt streams.Phase 6a time in seconds: 0Phase 6b: Find lost files.Phase 6b time in seconds: 0Phase 6c: Find lost blocks.Phase 6c time in seconds: 0Phase 6d: Check blocks used.Phase 6d time in seconds: 0Phase 6 time in seconds: 0WAFL_check total time in

seconds: 14(No filesystem state changed.)Note The [ ] around the phase number

indicates an operation on a flexiblevolume

Changes to flexible volumes are batched up and run at the end of the job.

Student Guide

Page 7: 02_Aggr_Wafl_v2

Running WAFL_check

Upon completion of WAFL_check you will be prompted to apply changes– These changes have been queued to be fixed on

aggregates and flexible volumes inside the aggregate

Student Guide

Page 8: 02_Aggr_Wafl_v2

Running wafliron

filer*> aggr wafliron start aggr1Wed May 19 00:38:20 GMT [wafl.iron.start:notice]: Starting wafliron

on aggregate aggr1.Wed May 19 00:38:21 GMT [wafl.iron.start:notice]: Starting wafliron

on volume vol1.filer*> Wed May 19 00:39:05 GMT [wafl.scan.iron.done:info]: Volume

vol1, wafliron completed.Wed May 19 00:39:06 GMT [wafl.scan.iron.done:info]: Aggregate

aggr1, wafliron completed.Wed May 19 00:39:12 GMT [wafl.scan.typebits.done:info]: Type bit

scan done on vol vol1.filer*>

Note Wafliron will iron the flexible volumes in the aggregate

Student Guide

Page 9: 02_Aggr_Wafl_v2

wafliron status

filer*> aggr wafliron statuswafliron is active on aggregate: aggr1

Scanning (7% done).filer*> filer*> wafl scan statusAggregate aggr1:Scan id Type of scan progress

3 wafliron demand 73 (27/27) of 1092Volume vol1:Scan id Type of scan progress

4 wafliron demand 69 (20/20) of 693

Notice there are 2 scanners that are active.– Scan id 3 is ironing the aggregate aggr1– Scan id 4 is ironing the flexible volume vol1 that is contained inside

of aggr1.

wafl iron with a –f volume will force a waflifon on volumes which are read-only

(snapshots/syncmirror)

Student Guide

Page 10: 02_Aggr_Wafl_v2

Logs and Errors

Logs– wafliron still logs errors with syslog messages– WAFL_check is the same as before for flexible volumes

• Logged to /etc/crash/wafl/– For aggregates, it’s a two step process:

– /vol/<aggrname>/WAFL_check_logs/WAFL_check (Kept in metadir in aggregate as aggr is not mountable)

– Once booted, info is placed into:» /vol/<rootvol>/etc/WAFL_check_logs/<aggrname>/W

AFL_check– If there are no errors, a log will NOT be created

Errorsfiler*> vol wafliron start vol1Cannot run wafliron on a flexible volume.filer*> vol wafliron start aggr1vol wafliron: 'aggr1' is an aggregate; use ‘aggr wafliron'

Student Guide

Page 11: 02_Aggr_Wafl_v2

Internal informationFile System InconsistenciesWith Aggregates and Flexible Volumes

Student Guide

Page 12: 02_Aggr_Wafl_v2

=

Additional Information

Anchor Steam TOI– Lots of bit and Engineering level informationhttp://web.netapp.com/engineering/projects/wafl/vv/vvol_wack_notes.txt

Eng notes These notes are truncated, for full notes go to URL.

Notes and thoughts about WAFL_check and (hybrid) virtual volumes

Andy Kahn ([email protected])

Last updated Feb 7th, 2003

$Id: //depot/doc/main/project/wafl/vv/vvol_wack_notes.txt#1 $

This document describes the changes to WAFL_check needed to support

virtual volumes. Specifically, this only applies to hybrid virtual

volumes. For completeness, pure virtual volumes will be mentioned

briefly at the end of this document. It is assumed that the reader is

reasonably familiar with the overall changes needed for hybrid virtual

volumes (refer to design doc).

Student Guide

Page 13: 02_Aggr_Wafl_v2

In this document, the following terms are used:

- "pvbn's" or "physical vbn's" - VBN's in the physical volume. - "vvbn's" or "virtual vbn's" - VBN's in the virtual volume. - "pvol" - physical volume. - "vvol" - virtual volume. - "vvid" - virtual volume id.

Depending on the context, "vvol" may be used interchangeably to refer

to the vvol container file/inode.

The aggregrate/blakegrate terminology hasn't been adopted in this

document. In the meantime, feel free to substitute:

- "pvbn" -> "aggvbn" or "avbn" or "blakevbn" or "bvbn" - "vvbn" -> "vbn" - "pvol" -> "aggregate" or "blakegrate" - "vvol" -> "volume" - "vvid" -> "volume id"

Sections in this document:

1. New to the physical volume

2. WAFL_check on the physical volume

3. WAFL_check on a pvol with vvols being destroyed

4. WAFL_check on a vvol

5. Pure virtual volumes

Student Guide

Page 14: 02_Aggr_Wafl_v2

1. New to the physical volume

-----------------------------------------------------------------------------

To support hybrid vvol's, the pvol has these additional changes:

- A new metafile at fileid 88, the "vvol_owner", or the "owner map".

This file maps pvbn's to the vvol which owns it and the vvol's

corresponding vvbn.

- Inodes with a new type, WAFL_TYPE_VVOL. These are the "wafl"

container inodes which reside in the pvol's metadirectory. The

contents of this container file is the vvol itself.

- Regular inodes for the "raid" file, which contains raid-like

information for a vvol. These live in the same directory as their

corresponding container file.

2. WAFL_check on the physical volume

-----------------------------------------------------------------------------

Running WAFL_check on the pvol requires WAFL_check to be aware of the

new additions, and to fix things if they are inconsistent. The

general order of execution will thus be:

1. Make sure all used blocks in the pvol show up as being

owned in the owner map by the pvol itself.

2. Make sure all WAFL_TYPE_VVOL inodes are accounted for.

3. Make sure owner map entries looks sane and is owned by either the pvol or a vvol.

4. WAFL_check all vvol's.

Student Guide

Page 15: 02_Aggr_Wafl_v2

Step 1: pvol blocks.

- After checking the pvol's inofile's buftree, check the owner map's

buftree.

- Rescan the inofile's buftree, but this time, each block needs to be

checked with the owner map. If the owner map shows that the block

is not owned by the pvol, clear the entry (to indicate that it *is*

owned by the pvol).

- For all remaining files, including metafiles, the buftree scan of

their inode also checks the owner map to ensure ownership by the

pvol. The only exception is for the vvol container blocks; they

reside in the pvol, but are owned by the vvol.

Step 2: WAFL_TYPE_VVOL inodes

- While scanning inodes in the pvol, all fileid's of the

WAFL_TYPE_VVOL inodes are stored in a list. This list will be used

later, after we've finished checking the pvol.

- Once WAFL_check is completed with the pvol, all vvol's in the pvol

are configured (aka "discovered"). Specifically, the pvol's

metadirectory is scanned for any vvol's. All vvol's found are not

mounted at this point in time.

- Compare the WAFL_TYPE_VVOL list found during the inofile scan

against the list of vvol's discovered. For brevity, "wack list"

refers to the first list while "pvol vvol list" refers to the

latter.

Student Guide

Page 16: 02_Aggr_Wafl_v2

There are four possibilities which can result:

- Case 0: If an inode is in neither list, do nothing (trivial case).

- Case 1: If an inode is in both lists, then only need to check if the WAFL_FLAG_METAFILE is set. If it isn't, set it.

- Case 2: If an inode is in the pvol's vvol list, but not in the wack list, then that means the inode is *not* of type WAFL_TYPE_VVOL. Change its type to WAFL_TYPE_VVOL, and set WAFL_FLAG_METAFILE if it isn't set already.

- Case 3: If an inode is in the wack list, but not in the pvol's vvol list, then this is either data corruption or a lost vvol. Set its type to WAFL_TYPE_REGULAR, clear WAFL_FLAG_METAFILE, and check if fbn's 1 or 2 look like valid volinfo's. If either do, it's likely we found a lost vvol, so move it lost+found.

Step 3: owner map

- Check the owner map by scanning through each entry. If the entry

corresponds to a physical block that is marked in-use by either the

active map or the summary map, then:

- Check if the vvol id is zero (aka, owned by the physical volume.) If so, the vvbn value should also be zero. Clear - Check if the vvol id in the entry exists in the pvol's vvol list. If it is not, clear the entry. Note that that the vvbn value is not checked, because we allow

sparse vvol's, which can have a vvbn range that is larger than the

pvol's vvbn range.

Step 4: Run WAFL_check on each individual vvol. See section 4.

Student Guide

Page 17: 02_Aggr_Wafl_v2

4. WAFL_check on a vvol

-----------------------------------------------------------------------------

When checking all the vvol's within a pvol, offline vvol's are also

made available. They are not actually mounted or brought online, so

their mount state is unchanged, and that they stay offline after the

WAFL_check.

The code path is mostly the same as it is for a pvol. The vvol does:

wafl_load_superblks() wack_boot_volume_from_disk() wack_dowack_vol() wack_dowack_vol_finish() Vvol's which were offline also do wack_unload_volume().

The main differences for a vvol are:

- The owner map (fileid 88) doesn't exist, so it is never loaded nor

checked.

- All blocks in the vvol are checked against its pvol's owner map for

correct ownership. If a virtual volume correctly owns a block, mark

this block as "claimed" in the bufs_claimed status file that

WAFL_check uses.

- If the vvol does not own a block, then check if the block is already

claimed. If it is, then this vvol doesn't own it and cannot claim

it. Prune this block from the buftree.

- Otherwise, this vvol can claim it.

Once all vvol's have been checked, the bufs_claimed file is scanned to

find any unclaimed blocks. Unclaimed blocks are then given to the

physical volume (clear the corresponding entry in the owner map), and moved to lost+found.

Student Guide

Page 18: 02_Aggr_Wafl_v2

5. Pure virtual volumes

-----------------------------------------------------------------------------

If a physical volume only contained pure virtual volumes, then there

would be no need for the owner map. In this case, WAFL_check only

needs to be aware of the new inode type (WAFL_TYPE_VVOL) and do the

comparison checking with the list of WAFL_TYPE_VVOL inodes encountered

during the inofile scan against the list of vvols found during the

vvol discovery phase.

Otherwise, running WAFL_check on a pure virtual volume will behave

just like it does on a present day physical volume.Slide 13

Student Guide

Page 19: 02_Aggr_Wafl_v2

Topic Review

If you run WAFL_check at the special boot menu, what will be checked?If you run wafliron at the Special Boot Menu, what would be ironed?If you wanted to check the status ofwafliron, what command would you use?If you wanted to wafliron a non-root aggregate, which command would you use?

Student Guide

Page 20: 02_Aggr_Wafl_v2

Exercises

Student Guide

Page 21: 02_Aggr_Wafl_v2

Exercise: WAFL_check and wafliron

Objective

When you have completed this module, you will be able to do the following:

• Execute WAFL_check at the special boot prompt

• Recognize the differences when checking aggregates vs. traditional volumes

• Execute wafliron and view status of the operation

Exercise Overview

This exercise is to highlight the differences in output when executing WAFL_check and wafliron on aggregates and flexible volumes.

Time Estimate

20 Minutes

Required Hardware, Software, and Tools

Hardware

• Standard class setup

Software

• DOT 7.0

Start of Exercise

WAFL_check on aggregates and volumes.

Step Action

1. Start the simulator by executing the maytag.L file. When Prompted, enter Y to the floppy boot question.

2. After the 1-5 menu appears, enter 22/7 to view the hidden commands.

Student Guide

Page 22: 02_Aggr_Wafl_v2

3. Enter the following;

WAFL_check

When prompted, answer “Yes” or “y” to which aggregates you wish to WAFL_check

The system will output what is occurring with the WAFL_check. Notice that the output in the [] are operations being performed on flexible volumes. The system will now display the status and ask you to reboot.

Watch the messages file for a status.

4. This is just an example of a WAFL_check on a system with a traditional vol0 and aggr1 containing a flexible volume.

Selection (1-5)? WAFL_check Mon Oct 18 20:58:59 GMT [fmmbx_instanceWorke:info]: Disk 3a.33 is a primary mailbox disk Mon Oct 18 20:58:59 GMT [fmmbx_instanceWorke:info]: Disk 3a.17 is a primary mailbox disk Mon Oct 18 20:58:59 GMT [fmmbx_instanceWorke:info]: normal mailbox instance on primary side Mon Oct 18 20:59:00 GMT [raid.assim.disk.brokenPreAssim:error]: Broken Disk 3a.24 Shelf ? Bay ? [NET APP X270_SCHT6036F10 NA05] S/N [3JA72MSH000074297CC7] detected prior to assimilation. It should be removed. Mon Oct 18 20:59:00 GMT [raid.disk.unload.done:info]: Unload of Disk 3a.24 Shelf 1 Bay 8 [NETAPP X 270_SCHT6036F10 NA05] S/N [3JA72MSH000074297CC7] has completed successfully Check vol0? y Check aggr1? y Checking vol0... WAFL_check NetApp Release RanchorsteamN_041010_2215 Starting at Mon Oct 18 20:59:09 GMT 2004 Phase 1: Verify fsinfo blocks. Phase 2: Verify metadata indirect blocks. Phase 3: Scan inode file. Phase 3a: Scan inode file special files. Phase 3a time in seconds: 1 Phase 3b: Scan inode file normal files.

Student Guide

Page 23: 02_Aggr_Wafl_v2

(inodes 5%) (inodes 10%) (inodes 15%) (inodes 20%) (inodes 25%) (inodes 30%) (inodes 35%) (inodes 40%) (inodes 45%) (inodes 50%) (inodes 55%) (inodes 60%) (inodes 65%) (inodes 70%) (inodes 75%) (inodes 80%) (inodes 85%) (inodes 90%) (inodes 95%) Phase 3b time in seconds: 5 Phase 3 time in seconds: 6 Phase 4: Scan directories. (dirs 15%) (dirs 26%) (dirs 45%) (dirs 46%) (dirs 46%) (dirs 66%) (dirs 75%) (dirs 81%) (dirs 92%) Phase 4 time in seconds: 2 Phase 6: Clean up. Phase 6a: Find lost nt streams. Phase 6a time in seconds: 0 Phase 6b: Find lost files. Phase 6b time in seconds: 4 Phase 6c: Find lost blocks. Phase 6c time in seconds: 0 Phase 6d: Check blocks used. Phase 6d time in seconds: 7 Phase 6 time in seconds: 11 WAFL_check total time in seconds: 19 (No filesystem state changed.) Checking aggr1... WAFL_check NetApp Release RanchorsteamN_041010_2215

Student Guide

Page 24: 02_Aggr_Wafl_v2

Starting at Mon Oct 18 20:59:29 GMT 2004 Phase 1: Verify fsinfo blocks. Phase 2: Verify metadata indirect blocks. Phase 3: Scan inode file. Phase 3a: Scan inode file special files. Phase 3a time in seconds: 0 Phase 3b: Scan inode file normal files. (inodes 5%) (inodes 10%) (inodes 15%) (inodes 20%) (inodes 25%) (inodes 30%) (inodes 35%) (inodes 41%) (inodes 46%) (inodes 51%) (inodes 56%) (inodes 61%) (inodes 66%) (inodes 71%) (inodes 76%) (inodes 82%) (inodes 87%) (inodes 92%) (inodes 97%) Phase 3b time in seconds: 2 Phase 3 time in seconds: 2 Phase 4: Scan directories. Phase 4 time in seconds: 0 Phase 5: Check volumes. Phase 5a: Check volume inodes Phase 5a time in seconds: 0 Phase 5b: Check volume contents Checking volume flexvol1... Phase [5.1]: Verify fsinfo blocks. Phase [5.2]: Verify metadata indirect blocks. Phase [5.3]: Scan inode file. Phase [5.3a]: Scan inode file special files. Phase [5.3a] time in seconds: 0 Phase [5.3b]: Scan inode file normal files. (inodes 5%) (inodes 10%) (inodes 15%) (inodes 20%) (inodes 25%)

Student Guide

Page 25: 02_Aggr_Wafl_v2

(inodes 30%) (inodes 35%) (inodes 40%) (inodes 45%) (inodes 50%) (inodes 55%) (inodes 60%) (inodes 65%) (inodes 70%) (inodes 75%) (inodes 80%) (inodes 85%) (inodes 90%) (inodes 95%) Phase [5.3b] time in seconds: 1 Phase [5.3] time in seconds: 3 Phase [5.4]: Scan directories. Phase [5.4] time in seconds: 0 Phase [5.6]: Clean up. Phase [5.6a]: Find lost nt streams. Phase [5.6a] time in seconds: 1 Phase [5.6b]: Find lost files. Phase [5.6b] time in seconds: 2 Phase [5.6c]: Find lost blocks. Phase [5.6c] time in seconds: 0 Phase [5.6d]: Check blocks used. Phase [5.6d] time in seconds: 1 Phase [5.6] time in seconds: 4 Volume flexvol1 WAFL_check time in seconds: 8 (No filesystem state changed.) Phase 5b time in seconds: 8 Phase 6: Clean up. Phase 6a: Find lost nt streams. Phase 6a time in seconds: 0 Phase 6b: Find lost files. Phase 6b time in seconds: 2 Phase 6c: Find lost blocks. Phase 6c time in seconds: 0 Phase 6d: Check blocks used. Phase 6d time in seconds: 1 Phase 6 time in seconds: 3 WAFL_check total time in seconds: 13 (No filesystem state changed.) Press any key to reboot system.[LCD:info] Rebooting

Student Guide

Page 26: 02_Aggr_Wafl_v2

wafliron on the root aggregate

Step Action

1. Start the simulator by executing the maytag.L file. When Prompted, enter Y to the floppy boot question.

2. At the 1-5 menu, enter 22/7 to display the secret list of commands. Note that wafliron is listed at the bottom.

3. At the 1-5 menu, enter wafliron.

4. The filer will start wafliron and begin to boot up. View the console messages as the system boots.

5. Once booted, enter the status command occasionally and note WAFL Iron’s process on the volumes: filer>priv set advanced filer>vol wafliron status filer>aggr wafliron status

Student Guide