02_aggr_wafl_v2
TRANSCRIPT
Module 2. WAFL Inconsistencies
WAFL Inconsistencies
DOT 7.0 Update Aggregate Performance
Student Guide
Objectives
At the conclusion of this module, you will be able to:– Describe how to fix WAFL inconsistencies with
aggregates and flexible volumes– Execute WAFL_check and wafliron on flexible
volumes and aggregates
Student Guide
Aggregates & Flexible Volumes
With Aggregates & Flexible Volumes– Data is actually stored in the aggregate– Therefore, file system inconsistencies actually live
in the aggregate– Aggregates can be inconsistent– WAFL_check and wafliron are aggregate
operations
Student Guide
Tools for Aggregates
Filer offline– WAFL_check
• Will prompt to check all aggregates or (will also check all flexible volumes contained inside aggregate)
– WAFL_check <aggrname> (will check flexible aggregates contained in named aggregate)
Filer offline momentarily– wafliron (From 1-5 menu)
• Checks every volume. This includes aggregates with flexvols, traditional volumes, and even the root volume
– aggr wafliron (from command prompt) • Non-root aggregates
Utilized on traditional volumes as before• vol wafliron (from command prompt) • Can use aggr wafliron as well
Student Guide
Running WAFL_check
Running WAFL_check will check flexible volumes in aggregate
Selection (1-5)? WAFL_check aggr1Checking aggr1...WAFL_check NetApp Release RanchorsteamN_040517_2215Starting at Wed May 19 00:43:27 GMT 2004Phase 1: Verify fsinfo blocks.Phase 2: Verify metadata indirect blocks....Phase 5: Check volumes.Phase 5a: Check volume inodesPhase 5a time in seconds: 0Phase 5b: Check volume contents
Student Guide
Running WAFL_check
Checking volume vol1...Phase [5.1]: Verify fsinfo blocks.Phase [5.2]: Verify metadata
indirect blocks....Phase [5.6d]: Check blocks
used.Phase [5.6d] time in seconds: 0Phase [5.6] time in seconds: 0WAFL_check time in seconds: 7(No filesystem state changed.)Phase 5b time in seconds: 7
Phase 6: Clean up.Phase 6a: Find lost nt streams.Phase 6a time in seconds: 0Phase 6b: Find lost files.Phase 6b time in seconds: 0Phase 6c: Find lost blocks.Phase 6c time in seconds: 0Phase 6d: Check blocks used.Phase 6d time in seconds: 0Phase 6 time in seconds: 0WAFL_check total time in
seconds: 14(No filesystem state changed.)Note The [ ] around the phase number
indicates an operation on a flexiblevolume
Changes to flexible volumes are batched up and run at the end of the job.
Student Guide
Running WAFL_check
Upon completion of WAFL_check you will be prompted to apply changes– These changes have been queued to be fixed on
aggregates and flexible volumes inside the aggregate
Student Guide
Running wafliron
filer*> aggr wafliron start aggr1Wed May 19 00:38:20 GMT [wafl.iron.start:notice]: Starting wafliron
on aggregate aggr1.Wed May 19 00:38:21 GMT [wafl.iron.start:notice]: Starting wafliron
on volume vol1.filer*> Wed May 19 00:39:05 GMT [wafl.scan.iron.done:info]: Volume
vol1, wafliron completed.Wed May 19 00:39:06 GMT [wafl.scan.iron.done:info]: Aggregate
aggr1, wafliron completed.Wed May 19 00:39:12 GMT [wafl.scan.typebits.done:info]: Type bit
scan done on vol vol1.filer*>
Note Wafliron will iron the flexible volumes in the aggregate
Student Guide
wafliron status
filer*> aggr wafliron statuswafliron is active on aggregate: aggr1
Scanning (7% done).filer*> filer*> wafl scan statusAggregate aggr1:Scan id Type of scan progress
3 wafliron demand 73 (27/27) of 1092Volume vol1:Scan id Type of scan progress
4 wafliron demand 69 (20/20) of 693
Notice there are 2 scanners that are active.– Scan id 3 is ironing the aggregate aggr1– Scan id 4 is ironing the flexible volume vol1 that is contained inside
of aggr1.
wafl iron with a –f volume will force a waflifon on volumes which are read-only
(snapshots/syncmirror)
Student Guide
Logs and Errors
Logs– wafliron still logs errors with syslog messages– WAFL_check is the same as before for flexible volumes
• Logged to /etc/crash/wafl/– For aggregates, it’s a two step process:
– /vol/<aggrname>/WAFL_check_logs/WAFL_check (Kept in metadir in aggregate as aggr is not mountable)
– Once booted, info is placed into:» /vol/<rootvol>/etc/WAFL_check_logs/<aggrname>/W
AFL_check– If there are no errors, a log will NOT be created
Errorsfiler*> vol wafliron start vol1Cannot run wafliron on a flexible volume.filer*> vol wafliron start aggr1vol wafliron: 'aggr1' is an aggregate; use ‘aggr wafliron'
Student Guide
Internal informationFile System InconsistenciesWith Aggregates and Flexible Volumes
Student Guide
=
Additional Information
Anchor Steam TOI– Lots of bit and Engineering level informationhttp://web.netapp.com/engineering/projects/wafl/vv/vvol_wack_notes.txt
Eng notes These notes are truncated, for full notes go to URL.
Notes and thoughts about WAFL_check and (hybrid) virtual volumes
Andy Kahn ([email protected])
Last updated Feb 7th, 2003
$Id: //depot/doc/main/project/wafl/vv/vvol_wack_notes.txt#1 $
This document describes the changes to WAFL_check needed to support
virtual volumes. Specifically, this only applies to hybrid virtual
volumes. For completeness, pure virtual volumes will be mentioned
briefly at the end of this document. It is assumed that the reader is
reasonably familiar with the overall changes needed for hybrid virtual
volumes (refer to design doc).
Student Guide
In this document, the following terms are used:
- "pvbn's" or "physical vbn's" - VBN's in the physical volume. - "vvbn's" or "virtual vbn's" - VBN's in the virtual volume. - "pvol" - physical volume. - "vvol" - virtual volume. - "vvid" - virtual volume id.
Depending on the context, "vvol" may be used interchangeably to refer
to the vvol container file/inode.
The aggregrate/blakegrate terminology hasn't been adopted in this
document. In the meantime, feel free to substitute:
- "pvbn" -> "aggvbn" or "avbn" or "blakevbn" or "bvbn" - "vvbn" -> "vbn" - "pvol" -> "aggregate" or "blakegrate" - "vvol" -> "volume" - "vvid" -> "volume id"
Sections in this document:
1. New to the physical volume
2. WAFL_check on the physical volume
3. WAFL_check on a pvol with vvols being destroyed
4. WAFL_check on a vvol
5. Pure virtual volumes
Student Guide
1. New to the physical volume
-----------------------------------------------------------------------------
To support hybrid vvol's, the pvol has these additional changes:
- A new metafile at fileid 88, the "vvol_owner", or the "owner map".
This file maps pvbn's to the vvol which owns it and the vvol's
corresponding vvbn.
- Inodes with a new type, WAFL_TYPE_VVOL. These are the "wafl"
container inodes which reside in the pvol's metadirectory. The
contents of this container file is the vvol itself.
- Regular inodes for the "raid" file, which contains raid-like
information for a vvol. These live in the same directory as their
corresponding container file.
2. WAFL_check on the physical volume
-----------------------------------------------------------------------------
Running WAFL_check on the pvol requires WAFL_check to be aware of the
new additions, and to fix things if they are inconsistent. The
general order of execution will thus be:
1. Make sure all used blocks in the pvol show up as being
owned in the owner map by the pvol itself.
2. Make sure all WAFL_TYPE_VVOL inodes are accounted for.
3. Make sure owner map entries looks sane and is owned by either the pvol or a vvol.
4. WAFL_check all vvol's.
Student Guide
Step 1: pvol blocks.
- After checking the pvol's inofile's buftree, check the owner map's
buftree.
- Rescan the inofile's buftree, but this time, each block needs to be
checked with the owner map. If the owner map shows that the block
is not owned by the pvol, clear the entry (to indicate that it *is*
owned by the pvol).
- For all remaining files, including metafiles, the buftree scan of
their inode also checks the owner map to ensure ownership by the
pvol. The only exception is for the vvol container blocks; they
reside in the pvol, but are owned by the vvol.
Step 2: WAFL_TYPE_VVOL inodes
- While scanning inodes in the pvol, all fileid's of the
WAFL_TYPE_VVOL inodes are stored in a list. This list will be used
later, after we've finished checking the pvol.
- Once WAFL_check is completed with the pvol, all vvol's in the pvol
are configured (aka "discovered"). Specifically, the pvol's
metadirectory is scanned for any vvol's. All vvol's found are not
mounted at this point in time.
- Compare the WAFL_TYPE_VVOL list found during the inofile scan
against the list of vvol's discovered. For brevity, "wack list"
refers to the first list while "pvol vvol list" refers to the
latter.
Student Guide
There are four possibilities which can result:
- Case 0: If an inode is in neither list, do nothing (trivial case).
- Case 1: If an inode is in both lists, then only need to check if the WAFL_FLAG_METAFILE is set. If it isn't, set it.
- Case 2: If an inode is in the pvol's vvol list, but not in the wack list, then that means the inode is *not* of type WAFL_TYPE_VVOL. Change its type to WAFL_TYPE_VVOL, and set WAFL_FLAG_METAFILE if it isn't set already.
- Case 3: If an inode is in the wack list, but not in the pvol's vvol list, then this is either data corruption or a lost vvol. Set its type to WAFL_TYPE_REGULAR, clear WAFL_FLAG_METAFILE, and check if fbn's 1 or 2 look like valid volinfo's. If either do, it's likely we found a lost vvol, so move it lost+found.
Step 3: owner map
- Check the owner map by scanning through each entry. If the entry
corresponds to a physical block that is marked in-use by either the
active map or the summary map, then:
- Check if the vvol id is zero (aka, owned by the physical volume.) If so, the vvbn value should also be zero. Clear - Check if the vvol id in the entry exists in the pvol's vvol list. If it is not, clear the entry. Note that that the vvbn value is not checked, because we allow
sparse vvol's, which can have a vvbn range that is larger than the
pvol's vvbn range.
Step 4: Run WAFL_check on each individual vvol. See section 4.
Student Guide
4. WAFL_check on a vvol
-----------------------------------------------------------------------------
When checking all the vvol's within a pvol, offline vvol's are also
made available. They are not actually mounted or brought online, so
their mount state is unchanged, and that they stay offline after the
WAFL_check.
The code path is mostly the same as it is for a pvol. The vvol does:
wafl_load_superblks() wack_boot_volume_from_disk() wack_dowack_vol() wack_dowack_vol_finish() Vvol's which were offline also do wack_unload_volume().
The main differences for a vvol are:
- The owner map (fileid 88) doesn't exist, so it is never loaded nor
checked.
- All blocks in the vvol are checked against its pvol's owner map for
correct ownership. If a virtual volume correctly owns a block, mark
this block as "claimed" in the bufs_claimed status file that
WAFL_check uses.
- If the vvol does not own a block, then check if the block is already
claimed. If it is, then this vvol doesn't own it and cannot claim
it. Prune this block from the buftree.
- Otherwise, this vvol can claim it.
Once all vvol's have been checked, the bufs_claimed file is scanned to
find any unclaimed blocks. Unclaimed blocks are then given to the
physical volume (clear the corresponding entry in the owner map), and moved to lost+found.
Student Guide
5. Pure virtual volumes
-----------------------------------------------------------------------------
If a physical volume only contained pure virtual volumes, then there
would be no need for the owner map. In this case, WAFL_check only
needs to be aware of the new inode type (WAFL_TYPE_VVOL) and do the
comparison checking with the list of WAFL_TYPE_VVOL inodes encountered
during the inofile scan against the list of vvols found during the
vvol discovery phase.
Otherwise, running WAFL_check on a pure virtual volume will behave
just like it does on a present day physical volume.Slide 13
Student Guide
Topic Review
If you run WAFL_check at the special boot menu, what will be checked?If you run wafliron at the Special Boot Menu, what would be ironed?If you wanted to check the status ofwafliron, what command would you use?If you wanted to wafliron a non-root aggregate, which command would you use?
Student Guide
Exercises
Student Guide
Exercise: WAFL_check and wafliron
Objective
When you have completed this module, you will be able to do the following:
• Execute WAFL_check at the special boot prompt
• Recognize the differences when checking aggregates vs. traditional volumes
• Execute wafliron and view status of the operation
Exercise Overview
This exercise is to highlight the differences in output when executing WAFL_check and wafliron on aggregates and flexible volumes.
Time Estimate
20 Minutes
Required Hardware, Software, and Tools
Hardware
• Standard class setup
Software
• DOT 7.0
Start of Exercise
WAFL_check on aggregates and volumes.
Step Action
1. Start the simulator by executing the maytag.L file. When Prompted, enter Y to the floppy boot question.
2. After the 1-5 menu appears, enter 22/7 to view the hidden commands.
Student Guide
3. Enter the following;
WAFL_check
When prompted, answer “Yes” or “y” to which aggregates you wish to WAFL_check
The system will output what is occurring with the WAFL_check. Notice that the output in the [] are operations being performed on flexible volumes. The system will now display the status and ask you to reboot.
Watch the messages file for a status.
4. This is just an example of a WAFL_check on a system with a traditional vol0 and aggr1 containing a flexible volume.
Selection (1-5)? WAFL_check Mon Oct 18 20:58:59 GMT [fmmbx_instanceWorke:info]: Disk 3a.33 is a primary mailbox disk Mon Oct 18 20:58:59 GMT [fmmbx_instanceWorke:info]: Disk 3a.17 is a primary mailbox disk Mon Oct 18 20:58:59 GMT [fmmbx_instanceWorke:info]: normal mailbox instance on primary side Mon Oct 18 20:59:00 GMT [raid.assim.disk.brokenPreAssim:error]: Broken Disk 3a.24 Shelf ? Bay ? [NET APP X270_SCHT6036F10 NA05] S/N [3JA72MSH000074297CC7] detected prior to assimilation. It should be removed. Mon Oct 18 20:59:00 GMT [raid.disk.unload.done:info]: Unload of Disk 3a.24 Shelf 1 Bay 8 [NETAPP X 270_SCHT6036F10 NA05] S/N [3JA72MSH000074297CC7] has completed successfully Check vol0? y Check aggr1? y Checking vol0... WAFL_check NetApp Release RanchorsteamN_041010_2215 Starting at Mon Oct 18 20:59:09 GMT 2004 Phase 1: Verify fsinfo blocks. Phase 2: Verify metadata indirect blocks. Phase 3: Scan inode file. Phase 3a: Scan inode file special files. Phase 3a time in seconds: 1 Phase 3b: Scan inode file normal files.
Student Guide
(inodes 5%) (inodes 10%) (inodes 15%) (inodes 20%) (inodes 25%) (inodes 30%) (inodes 35%) (inodes 40%) (inodes 45%) (inodes 50%) (inodes 55%) (inodes 60%) (inodes 65%) (inodes 70%) (inodes 75%) (inodes 80%) (inodes 85%) (inodes 90%) (inodes 95%) Phase 3b time in seconds: 5 Phase 3 time in seconds: 6 Phase 4: Scan directories. (dirs 15%) (dirs 26%) (dirs 45%) (dirs 46%) (dirs 46%) (dirs 66%) (dirs 75%) (dirs 81%) (dirs 92%) Phase 4 time in seconds: 2 Phase 6: Clean up. Phase 6a: Find lost nt streams. Phase 6a time in seconds: 0 Phase 6b: Find lost files. Phase 6b time in seconds: 4 Phase 6c: Find lost blocks. Phase 6c time in seconds: 0 Phase 6d: Check blocks used. Phase 6d time in seconds: 7 Phase 6 time in seconds: 11 WAFL_check total time in seconds: 19 (No filesystem state changed.) Checking aggr1... WAFL_check NetApp Release RanchorsteamN_041010_2215
Student Guide
Starting at Mon Oct 18 20:59:29 GMT 2004 Phase 1: Verify fsinfo blocks. Phase 2: Verify metadata indirect blocks. Phase 3: Scan inode file. Phase 3a: Scan inode file special files. Phase 3a time in seconds: 0 Phase 3b: Scan inode file normal files. (inodes 5%) (inodes 10%) (inodes 15%) (inodes 20%) (inodes 25%) (inodes 30%) (inodes 35%) (inodes 41%) (inodes 46%) (inodes 51%) (inodes 56%) (inodes 61%) (inodes 66%) (inodes 71%) (inodes 76%) (inodes 82%) (inodes 87%) (inodes 92%) (inodes 97%) Phase 3b time in seconds: 2 Phase 3 time in seconds: 2 Phase 4: Scan directories. Phase 4 time in seconds: 0 Phase 5: Check volumes. Phase 5a: Check volume inodes Phase 5a time in seconds: 0 Phase 5b: Check volume contents Checking volume flexvol1... Phase [5.1]: Verify fsinfo blocks. Phase [5.2]: Verify metadata indirect blocks. Phase [5.3]: Scan inode file. Phase [5.3a]: Scan inode file special files. Phase [5.3a] time in seconds: 0 Phase [5.3b]: Scan inode file normal files. (inodes 5%) (inodes 10%) (inodes 15%) (inodes 20%) (inodes 25%)
Student Guide
(inodes 30%) (inodes 35%) (inodes 40%) (inodes 45%) (inodes 50%) (inodes 55%) (inodes 60%) (inodes 65%) (inodes 70%) (inodes 75%) (inodes 80%) (inodes 85%) (inodes 90%) (inodes 95%) Phase [5.3b] time in seconds: 1 Phase [5.3] time in seconds: 3 Phase [5.4]: Scan directories. Phase [5.4] time in seconds: 0 Phase [5.6]: Clean up. Phase [5.6a]: Find lost nt streams. Phase [5.6a] time in seconds: 1 Phase [5.6b]: Find lost files. Phase [5.6b] time in seconds: 2 Phase [5.6c]: Find lost blocks. Phase [5.6c] time in seconds: 0 Phase [5.6d]: Check blocks used. Phase [5.6d] time in seconds: 1 Phase [5.6] time in seconds: 4 Volume flexvol1 WAFL_check time in seconds: 8 (No filesystem state changed.) Phase 5b time in seconds: 8 Phase 6: Clean up. Phase 6a: Find lost nt streams. Phase 6a time in seconds: 0 Phase 6b: Find lost files. Phase 6b time in seconds: 2 Phase 6c: Find lost blocks. Phase 6c time in seconds: 0 Phase 6d: Check blocks used. Phase 6d time in seconds: 1 Phase 6 time in seconds: 3 WAFL_check total time in seconds: 13 (No filesystem state changed.) Press any key to reboot system.[LCD:info] Rebooting
Student Guide
wafliron on the root aggregate
Step Action
1. Start the simulator by executing the maytag.L file. When Prompted, enter Y to the floppy boot question.
2. At the 1-5 menu, enter 22/7 to display the secret list of commands. Note that wafliron is listed at the bottom.
3. At the 1-5 menu, enter wafliron.
4. The filer will start wafliron and begin to boot up. View the console messages as the system boots.
5. Once booted, enter the status command occasionally and note WAFL Iron’s process on the volumes: filer>priv set advanced filer>vol wafliron status filer>aggr wafliron status
Student Guide