dd os 4.9 offline diagnostics suite user guide · emc proprietary and confidential - for internal...

34
EMC Proprietary and Confidential - For Internal Use Only DD OS 4.9 Offline Diagnostics Suite User Guide Backup Recovery Systems Division Data Domain LLC 2421 Mission College Boulevard, Santa Clara, CA 95054 866-WE-DDUPE; 408-980-4800 759-0021-0001 Revision C May 18, 2012

Upload: others

Post on 22-Mar-2020

29 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DD OS 4.9 Offline Diagnostics Suite User Guide · EMC Proprietary and Confidential - For Internal Use Only DD OS 4.9 Offline Diagnostics Suite User Guide This guide provides a troubleshooting

EMC Proprietary and Confidential - For Internal Use Only

DD OS 4.9 Offline Diagnostics Suite User Guide

Backup Recovery Systems DivisionData Domain LLC2421 Mission College Boulevard, Santa Clara, CA 95054866-WE-DDUPE; 408-980-4800759-0021-0001 Revision CMay 18, 2012

Page 2: DD OS 4.9 Offline Diagnostics Suite User Guide · EMC Proprietary and Confidential - For Internal Use Only DD OS 4.9 Offline Diagnostics Suite User Guide This guide provides a troubleshooting

EMC Proprietary and Confidential - For Internal Use Only

2

Copyright © 2012 EMC Corporation. All Rights Reserved.

EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice.

THE INFORMATION IN THIS PUBLICATION IS PROVIDED "AS IS." EMC CORPORATION MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Use, copying, and distribution of any EMC software described in this publication requires an applicable software license.

EMC, Data Domain, and Global Compression are registered trademarks or trademarks of EMC Corporation in the United States and/or other countries.

All other trademarks used herein are the property of their respective owners.

Page 3: DD OS 4.9 Offline Diagnostics Suite User Guide · EMC Proprietary and Confidential - For Internal Use Only DD OS 4.9 Offline Diagnostics Suite User Guide This guide provides a troubleshooting

DD OS 4.9 Offline Diagnostics Suite User Guide 3

EMC Proprietary and Confidential - For Internal Use Only

DD OS 4.9 Offline Diagnostics Suite User Guide

This guide provides a troubleshooting flow for selecting the appropriate diagnostics for your problem and then running the diagnostics to identify the faulty field replaceable unit (FRU) and generate recommended service actions.

This document covers the topics shown in the following table:

Overview and Supported Systems Page 4

Requirements Page 6

Connect a Console to the System Page 8

Select and Run Diagnostics Page 12

Get Log Information After Running Diagnostics Page 29

Diagnostic Test Descriptions Page 33

Page 4: DD OS 4.9 Offline Diagnostics Suite User Guide · EMC Proprietary and Confidential - For Internal Use Only DD OS 4.9 Offline Diagnostics Suite User Guide This guide provides a troubleshooting

EMC Proprietary and Confidential - For Internal Use Only

4 DD OS 4.9 Offline Diagnostics Suite User Guide

Overview and Supported SystemsData Domain provides both online and offline diagnostics for its systems:

• Online diagnostics are invoked on the Data Domain operating system (DD OS) command line. Some diagnostics, such as system status and enclosure show all, which report the status of fans, power supplies, and temperature sensors, are also run automatically in the background to monitor the system during runtime. Email alerts are issued if problems are detected.

• Offline diagnostics are run in response to customer problem reports, such as when a system cannot be booted to online operation, a card or disk is absent, or memory, connectivity, or configuration problems are suspected. Offline diagnostics check FRUs such as the system controller disks, motherboard, memory (DIMMs), NVRAM card, and other hardware.

Major differences in the use of online and offline diagnostics are:

• Offline diagnostics are used when the system is unable to come online.

• Offline diagnostics are used if the system is hanging frequently or has serious performance issues. These diagnostics can isolate performance problems to specific components.

• After online diagnostics detect a problem, offline diagnostics may be needed for further fault isolation or confirmation.

• Online diagnostics detect problems only when they access the part of the component that has the problem, whereas offline diagnostics test the full range of the component —an entire disk, for example—and can detect latent faults.

Data Protection

Note: Diagnostics are run in offline mode, and require a reboot to load.

In offline mode, the Data Domain filesystem is not running, and no customer data is flowing through the system. Tests are completely data-safe and non-destructive.

Page 5: DD OS 4.9 Offline Diagnostics Suite User Guide · EMC Proprietary and Confidential - For Internal Use Only DD OS 4.9 Offline Diagnostics Suite User Guide This guide provides a troubleshooting

DD OS 4.9 Offline Diagnostics Suite User Guide 5

EMC Proprietary and Confidential - For Internal Use Only

Supported SystemsTable 1 shows which diagnostic tests can be run on each supported Data Domain system (DDxxx) and which FRU is tested.

Table 1: Offline Diagnostics Support for DD OS 4.9 Systems (X=Supported)

FRU Tested

Test Name

DD140DD610DD630

DD880 DD880g DD670

Caution: The offline diagnostics described in this guide support only the DD OS 4.9 systems shown in Table 1. Do not run these diagnostics on any other DD OS version or system, as unexpected behavior may result.

For test coverage information, see “Diagnostic Test Descriptions” on page 33.

System Controller Boot Disk

HDD Quick Test X X X

System Controller Disks (all)

HDD Comprehensive Test X X X

Fibre Channel HBA Card, Cable

FC Diagnostic (VTL systems are not supported)

DD880g Only

Memory (DIMMs) Memory Diagnostics X X X

Motherboard CPU Test X X X

CPU MCE Test X X X

Motherboard PCIe Topology Test X

Ethernet Network Interface Card (NIC)

Network Internal Loopback Test X

Network External Loopback Test X

NVRAM Card NVRAM Card Test X X X

Serial Attached SCSI (SAS) Daughter and HBA Expansion Cards

SAS Diagnostics Test DD880 Only

X

Page 6: DD OS 4.9 Offline Diagnostics Suite User Guide · EMC Proprietary and Confidential - For Internal Use Only DD OS 4.9 Offline Diagnostics Suite User Guide This guide provides a troubleshooting

EMC Proprietary and Confidential - For Internal Use Only

6 DD OS 4.9 Offline Diagnostics Suite User Guide

Requirements

System Controller Boot Disk with DD OS 4.9To boot offline diagnostics, you must have a functional system controller boot disk with DD OS 4.9 installed. This release contains the Offline Diagnostics Suite.

System ConsoleYou must have one of the following consoles connected to the system:

• PS/2 keyboard or USB keyboard with a VGA monitor.

(A USB-to-PS2 converter is needed to use a PS/2 keyboard with DD880 and DD880g.)

• KVM console.

• Serial console or laptop with terminal emulation software such as Secure CRT, PuTTY, or HyperTerminal (required for running DD OS commands). A null modem cable with a DB9 female connector is required to connect a serial console or laptop to the system. Cisco “rollover” cables cannot be used.

If you plan to run diagnostics remotely, be aware that a person is still required onsite in the following situations:

• If the system has crashed or been powered down, it must be power cycled or powered on by a person at the site to boot offline diagnostics.

• If diagnostic log files cannot be written to the system disk, you can save them to a formatted USB key inserted by a person at the site.

Console connections and scenarios are described in “Connect a Console to the System” on page 8.

System and BIOS PasswordsIf the system is up and running DD OS, you must log in as sysadmin to:

• Issue the system reboot command to boot offline diagnostics or enter BIOS mode to connect a serial console for the first time.

• Run certain DD OS diagnostic commands referenced in this guide.

If the system is powered down, you do not need sysadmin access to boot offline diagnostics or enter BIOS mode.

The Data Domain default sysadmin and BIOS passwords are provided with the procedures in this guide. However, if these passwords have been changed, you need the new passwords prior to starting the procedures.

Page 7: DD OS 4.9 Offline Diagnostics Suite User Guide · EMC Proprietary and Confidential - For Internal Use Only DD OS 4.9 Offline Diagnostics Suite User Guide This guide provides a troubleshooting

DD OS 4.9 Offline Diagnostics Suite User Guide 7

EMC Proprietary and Confidential - For Internal Use Only

USB Key (Optional)After running diagnostics, log files are automatically saved to the system boot disk and to an external USB key, if one is inserted (see “Inserting and Removing a USB Key for Writing Logs” on page 20).

You may want to use a USB key to store log files if:

• Diagnostic logs cannot be written to the system disk. (You are prompted to insert a USB key, or cancel without saving logs.)

• You might not be able to reboot the system to online mode to access the offline diagnostic log file (a concatenation of all logs) on the system boot disk.

Requirements for the USB key (a.k.a. keychain drive, thumb drive, or flash memory stick) are:

• FAT32 (Unix VFAT) format

• 10 MB of free space

For more information on viewing log files on the system or USB key after running offline diagnostics, see “Get Log Information After Running Diagnostics” on page 29.

System Downtime

The typical time required to run all tests in the suite is 45 to 70 minutes, depending on the system type and configuration.

Table 2 shows the maximum possible runtime for each test as displayed in the diagnostics user interface. Maximum runtime is the test execution time plus the time needed for the diagnostic to time out if it cannot complete. This is always greater than the test execution time.

Table 2: Maximum Runtimes for Individual Tests

Test Group Test Maximum Runtime

Network Interface Card Network Internal Loopback Test 30 minutes

Network External Loopback Test 30 minutes

Memory Memory Diagnostics 63 minutes

Motherboard CPU Test 16 minutes

CPU MCE Test 3 minutes, 10 seconds

Motherboard PCIe Topology Test 1 minute

NVRAM Card NVRAM Card Test 3 minutes

HDD HDD Quick Test 10 minutes

HDD Comprehensive Test 60 minutes

SAS SAS Diagnostics Test 20 or 30 minutes per board

Gateway FC Diagnostic 25 minutes

Page 8: DD OS 4.9 Offline Diagnostics Suite User Guide · EMC Proprietary and Confidential - For Internal Use Only DD OS 4.9 Offline Diagnostics Suite User Guide This guide provides a troubleshooting

EMC Proprietary and Confidential - For Internal Use Only

8 DD OS 4.9 Offline Diagnostics Suite User Guide

Connect a Console to the SystemIf your system has a console connected, skip to “Select and Run Diagnostics” on page 12.

Data Domain systems provide the following connectors for attaching a console:

• DIN-type connectors for a PS/2 keyboard and a mouse (except DD880 and DD880g)

• USB-A receptacle port for a USB keyboard

• DB15 female for a VGA monitor

• DB9 male for a serial connection

Figure 1 shows the ways in which you can connect a console or laptop to the system.

Data Domain System

KVM Console

or

Direct Connections

Serial Console/Serveror

Remote Serial Link

Laptop with Terminal Emulation Software

Serial Console/Server

or

Laptop/TerminalEmulation Software

Serial Consoleor

Virtual Console

PS/2 or USB Keyboard with VGA Monitoror

Null modem cable from PC serial port

DB15

PS/2* or USB

VGA

Keyboard

DB9

*Not present on DD880 and DD880g.

DB9

Data Domain System

Figure 1: System Console Connections

Page 9: DD OS 4.9 Offline Diagnostics Suite User Guide · EMC Proprietary and Confidential - For Internal Use Only DD OS 4.9 Offline Diagnostics Suite User Guide This guide provides a troubleshooting

DD OS 4.9 Offline Diagnostics Suite User Guide 9

EMC Proprietary and Confidential - For Internal Use Only

PS/2 and KVM KeyboardsNote: DD880 and DD880g systems do not provide a PS/2 port. A USB-to-PS2 converter is needed to use a PS/2 keyboard with these systems.

Make sure that the system is powered off when you connect a PS/2 or KVM keyboard to the system’s DIN-type connector. The PS/2 keyboard does not function at all unless it is plugged in before the system is powered up.

The system confirms the presence of the keyboard during powerup by flashing the Num Lock, Scroll Lock, and Caps Lock LEDs once.

For more information, see the Knowledge Base article, “Connecting a PS/2 Keyboard,” on the Data Domain Support Portal:

https://my.datadomain.com/download/kb/all/Connecting_a_PS-2_Keyboard.html(Support login is required.)

Serial Consoles and LaptopsConfigure the terminal software to use the correct COM port with the following settings:

• Baud rate: 9600

• Bits: 8

• Parity: None

• Stop bits: 1

• Flow control: None

• Terminal emulation type: VT100

Note: The default terminal emulation type for PuTTY is ESC[n~. Change this to VT100+; otherwise, function keys do not work properly.

For more information, see the Knowledge Base article, “Connecting to a Data Domain System with a Serial Cable,” on the Data Domain Support Portal:

https://my.datadomain.com/download/kb/all/Connecting_to_the_Data_Domain_System_with_a_Serial_Cable.html (Support login is required.)

Page 10: DD OS 4.9 Offline Diagnostics Suite User Guide · EMC Proprietary and Confidential - For Internal Use Only DD OS 4.9 Offline Diagnostics Suite User Guide This guide provides a troubleshooting

EMC Proprietary and Confidential - For Internal Use Only

10 DD OS 4.9 Offline Diagnostics Suite User Guide

Serial Console Redirection

If you are connecting a serial console or laptop (which use the system’s DB9 port), you may need to change system BIOS settings. Below are the steps to verify or change the BIOS settings. Make sure that there are no backups in progress before proceeding.

Note: In this section, you power up or reboot the Data Domain system. If the system is crashed or hung and cannot be powered up or rebooted from DD OS, refer to the following Knowledge Base articles on the Data Domain Support Portal:

• “New System Will Not Power On or Boot Completely”https://my.datadomain.com/download/kb/rp/New_System_Will_Not_Boot_DOA.html?query=query%3Dwon%27t+boot&fsearch=1&pagenumber=1&size=10&index=0&filterids=194328

• “Data Domain System Will Not Boot and Has Been Working”https://my.datadomain.com/download/kb/rp/Will_Not_Boot_Has_Been_Working.html?query=query%3Dwon%27t+boot&fsearch=1&pagenumber=1&size=10&index=1&filterids=194328

(Support login is required.)

Table 3: Data Domain Default BIOS Passwords

System Password

1. If the system is powered down, press the power button on the front of the system to power it up and skip to step 3.

2. If the system is powered up and there is a system prompt on the console, then:

a. Log in as sysadmin (the Data Domain default password is abc123) and enter

# system reboot

b. Answer yes to the Are you sure? prompt.

3. As the system boots, watch for the BIOS prompt, then press the F2 key repeatedly to enter BIOS mode. Enter the Data Domain default password from Table 3 or else a user-specified password, if you are prompted.

The BIOS setup utility appears.

DD140, DD610, DD630, DD670 d600d

DD880, DD880g d880d

Page 11: DD OS 4.9 Offline Diagnostics Suite User Guide · EMC Proprietary and Confidential - For Internal Use Only DD OS 4.9 Offline Diagnostics Suite User Guide This guide provides a troubleshooting

Table 4: BIOS Settings for Serial Console Redirection

System Setting

All systems except DD880 and DD880g

Systems DD880 and DD880g only

DD OS 4.9 Offline Diagnostics Suite User Guide 11

EMC Proprietary and Confidential - For Internal Use Only

4. Edit the system BIOS to enable serial console redirection, as shown in Table 4.

Figure 2 shows the initial BIOS setup screens for serial console redirection.

Figure 2: BIOS Setup Utility

5. Save changes and exit the BIOS to reboot (press F10, if the key is available).

DD140, DD610, DD630, DD670 BIOS > Server > Remote Access Configuration > Redirection after BIOS POST > [Always]

DD880, DD880g BIOS > Server Management > Console Redirection > Legacy OS Redirection > [Enabled]

Page 12: DD OS 4.9 Offline Diagnostics Suite User Guide · EMC Proprietary and Confidential - For Internal Use Only DD OS 4.9 Offline Diagnostics Suite User Guide This guide provides a troubleshooting

EMC Proprietary and Confidential - For Internal Use Only

12 DD OS 4.9 Offline Diagnostics Suite User Guide

Select and Run DiagnosticsFigure 3 shows the troubleshooting flow.

Find your problem in the problem list and note the diagnostics specified.

(See “Find the Problem Definition and Its Specified Diagnostics” on page 13.)

Reboot the Data Domain system to offline diagnostics mode.

(See “Reboot the System and Prepare to Run Diagnostics” on page 18.)

Run the diagnostics specified for your problem and check the results.

(See “Run Diagnostics and Check Results” on page 21.)

Perform the recommended service actionfor failed diagnostics and get additional

information from diagnostic logs.

(See “Perform the Recommended Service Actions” on page 26.)

Save logs to the system disk and (optional) USB key, then quit diagnostics and reboot

the system.

(See “Save Logs and Exit Diagnostics” on page 28.)

1)

2)

3)

4)

5)

Figure 3: Diagnostic Troubleshooting Flow

Page 13: DD OS 4.9 Offline Diagnostics Suite User Guide · EMC Proprietary and Confidential - For Internal Use Only DD OS 4.9 Offline Diagnostics Suite User Guide This guide provides a troubleshooting

DD OS 4.9 Offline Diagnostics Suite User Guide 13

EMC Proprietary and Confidential - For Internal Use Only

Find the Problem Definition and Its Specified DiagnosticsFigure 4 shows failures identified for supported systems and specifies which diagnostics to run. If there are additional symptoms, such as behavior, messages, or alerts, go to the table indicated to obtain the diagnostics to run. Then go to “Reboot the System and Prepare to Run Diagnostics” on page 18.

Start

Failure: System powerup, bootup, or filesystem startup

Yes

No

No Run: All diagnostics

YesGo to: Table 5 on page 14

Failure: System panics and automatically reboots

Yes

No

Failure:• FRU or slot disabled• FRU absent• FRU or connections faulty

No

Failure: Reduced performance or throughput

Failure: New installation, upgrade, or maintenance issue

Additional symptoms?

Go to: Table 6 on page 14

Run:• Network External L/B Test• Motherboard PCIe Topology

Test• HDD Comprehensive Test• SAS or FC (DD880g only, no

VTL) Test

Go to: Table 7 on page 15 to identify additional symptoms and select the appropriate diagnostics

Go to: Table 8 on page 17

Go to: Table 9 on page 17

Run: All diagnosticsNo

Yes

Additional symptoms?

StartNew Installations or Upgrades

Installed Systems

No

Yes

Additional symptoms?

Yes

Yes

Yes

No

Yes

Additional symptoms?

Run: All diagnostics

Run: All diagnostics

No

YesFailure: System is hung

Note: You can access online diagnostics log messages with alerts and information in the bios.txt file using the DD OS log view command. Autosupport (ASUP) reports containing alerts from online diagnostics can be generated and viewed using the autosupport show report command. Refer to the Command Reference Guide.

Figure 4: Failure Identification and Relevant Diagnostics

Page 14: DD OS 4.9 Offline Diagnostics Suite User Guide · EMC Proprietary and Confidential - For Internal Use Only DD OS 4.9 Offline Diagnostics Suite User Guide This guide provides a troubleshooting

Table 6: System Panics and Reboots with Additional Symptoms

Console Message/Alerts/Other Tests to Run

Uh-huh. NMI received for unknown reason 20

ALERT: MSG-TOOLS-00004: DRAM uncorrected multi bit error

EMC Proprietary and Confidential - For Internal Use Only

14 DD OS 4.9 Offline Diagnostics Suite User Guide

Table 5: Powerup, Bootup, or Filesystem Problems with Additional Symptoms

Console Message/Alerts/Other Tests to Run

Boot messages during system startup indicate that one or more network port configuration operations failed.

Network Internal Loopback Test

Console displays NVRAM errors during boot.Alert indicates multi-bit uncorrectable errors on NVRAM card after power cycle.Alert indicates that NVRAM card battery is low.Alert indicates that DD OS has disabled NVRAM batteries or that batteries have not fully charged.

NVRAM Card Test

System is unable to boot with expansion shelves connected, but does boot when they are disconnected.

SAS Diagnostics Test

UNKNOWN TOPOLOGY appears on the console at powerup, indicating that there is no communication with a storage device or other IO target. An LED tells you the link status between two points.

• Cable connection is suspected.• FC HBA problem with SFP connector is suspected.• Zone issue (switch) configuration problem is suspected.

FC Diagnostic (DD880g only, no VTL)

Memory slot is disabled.• Example message in bios.txt:…| Slot/Connector #0xe3 | Slot is Disabled | Asserted)

Memory Diagnostics

System panics and automatically reboots.• Single-bit flip is logged in messages.engineering and kern.info.

• MCE error is logged in kern.info.

CPU Test, CPU MCE Test

System panic and reboot is caused by an uncorrectable ECC (POST or runtime) error.

• Console message:

• Typical alert:

DIMM failed self-test during bootup.• Slot fault or disable message is logged in bios.txt during

POST.

Memory Diagnostics

System reboots early in boot cycle. HDD Quick Test

System keeps rebooting and is unstable with a NIC card, but stable without it.

Network External Loopback Test

Page 15: DD OS 4.9 Offline Diagnostics Suite User Guide · EMC Proprietary and Confidential - For Internal Use Only DD OS 4.9 Offline Diagnostics Suite User Guide This guide provides a troubleshooting

DD OS 4.9 Offline Diagnostics Suite User Guide 15

EMC Proprietary and Confidential - For Internal Use Only

Table 7: Slot or FRU Disabled, FRU Absent, or Faulty FRU or Connection with Additional Symptoms

Console Message/Alerts/Other Tests to Run

Memory slot is disabled.

• Example message in bios.txt:…| Slot/Connector #0xe3 | Slot is Disabled | Asserted)

DIMM failure is suspected.

Correctable ECC limit is exceeded (runtime error).

• ECC errors are logged in bios.txt.• Typical alerts:ALERT: MSG-TOOLS-00005 : DRAM slot disabled due to ECC errors Corectable ECC limit exceeded

ALERT: MSG-TOOLS-00005 : DRAM slot disabled due to ECC errors

Memory Diagnostics

DD OS disables a memory DIMM slot because of excessive correctable errors.

• Typical alert:ALERT: MSG-TOOLS-00005 : DRAM slot disabled due to ECC errors

Faulty memory, CPU, or motherboard is suspected.

Faulty QPI link is suspected.

Memory Diagnostics

CPU Test

CPU MCE Test

System is unable to access a device when it is physically present, device fails to respond to PCIe transactions.

• Incorrect platform topology is suspected.• Faulty connection is suspected.

Motherboard PCIe Topology Test

LUN is present but cannot do IOs (configuration issue).

Bad connection to LUN is suspected.

Bad (direct-attached) cable connection is suspected.

FC Diagnostic (DD880g only, no VTL)

System controller hard disk drive (HDD) is absent. HDD Quick Test

HDD Comprehensive Test

NVRAM card is absent.

NVRAM card batteries are disabled.

NVRAM card battery fault is suspected.

NVRAM card battery connection fault is suspected after visual inspection.

NVRAM batteries have not fully charged.

NVRAM Card Test

Page 16: DD OS 4.9 Offline Diagnostics Suite User Guide · EMC Proprietary and Confidential - For Internal Use Only DD OS 4.9 Offline Diagnostics Suite User Guide This guide provides a troubleshooting

EMC Proprietary and Confidential - For Internal Use Only

16 DD OS 4.9 Offline Diagnostics Suite User Guide

DD OS SAS diagnostics command enclosure test topology detected a connectivity problem and further fault isolation is necessary.

Multiple drive failures occurred.

Repeated failures or absences of drives, especially if in the same slot.

Multipath errors on failure.

SAS Diagnostics Test

Statistics output from the DD OS command net config (alias ifconfig) shows abnormally high values for Tx/Rx and error counters. Possible issues are with the system bus or NIC IO interfaces.

Statistics output from the DD OS command ethtool -S ifname shows errors such as DMA underrun, DMA overrun, frame errors, and CRC errors. The diagnostic verifies that the controller is functional.

Note: Contact Data Domain Support for information on using the ethtool -S ifname command.

PCIe reads terminate with aborts, and all Fs are returned by the system, resulting in big values. This could be an indication of problems with the system bus and network device's IO interface. The diagnostic confirms whether there is an issue with either the IO slot or the controller.

The link light on the NIC card does not get turned on after:

• Changing the cable, transceiver, and switch port.

• Issuing the DD OS net config ifname up command, even if the ports are connected to a switch or a peer device.

Network External Loopback Test

Motherboard PCIe Topology Test

Table 7: Slot or FRU Disabled, FRU Absent, or Faulty FRU or Connection with Additional Symptoms (Continued)

Console Message/Alerts/Other Tests to Run

Page 17: DD OS 4.9 Offline Diagnostics Suite User Guide · EMC Proprietary and Confidential - For Internal Use Only DD OS 4.9 Offline Diagnostics Suite User Guide This guide provides a troubleshooting

Table 8: Reduced Performance or Throughput with Additional Symptoms

Console Message/Alerts/Other Tests to Run

Table 9: Installation, Upgrade, or Maintenance Issue with Additional Symptoms

Console Message/Alerts/Other Tests to Run

DD OS 4.9 Offline Diagnostics Suite User Guide 17

EMC Proprietary and Confidential - For Internal Use Only

System is slow.

Monitor kern.info for disk-related errors or excessive retries causing sluggish or slow response.

HDD Comprehensive Test

SAS or FC (DD880g only, no VTL) Diagnostics Test

System shows a large, random slowdown of throughput.

Network performance is low. The diagnostic shows whether or not hardware is functional.

CIFS/NFS applications fail on backup or restore, with many resets.

Network External Loopback Test

SAS or FC (DD880g only, no VTL) Diagnostics Test

System shows degraded IO performance.

System shows reduced IO throughput.

Motherboard PCIe Topology Test

Network External Loopback Test

SAS or FC (DD880g only, no VTL) Diagnostics Test

System fails a fresh install and drops into kernel debug mode (kdb). NVRAM Card Test

The system is down and not booting up properly. See Table 5 on page 14.

The system is down and you want to check for hard drive failure. HDD Quick Test

HDD Comprehensive Test

On a new installation, you need to verify that shelves are connected according to the installation plan.

SAS Diagnostics Test

Page 18: DD OS 4.9 Offline Diagnostics Suite User Guide · EMC Proprietary and Confidential - For Internal Use Only DD OS 4.9 Offline Diagnostics Suite User Guide This guide provides a troubleshooting

EMC Proprietary and Confidential - For Internal Use Only

18 DD OS 4.9 Offline Diagnostics Suite User Guide

Reboot the System and Prepare to Run DiagnosticsNote: Make sure a console is connected to the Data Domain system, either onsite or via a remote serial link. For more information, see “System Console” on page 6. If the system is crashed or hung and cannot be powered up or rebooted from DD OS, refer to the note on page 10.

1. If the system is powered down, press the power button on the front of the system to power it up and skip to step 3.

2. If the system is powered up and there is a system prompt on the console, stop any backups that are running or wait until those backups are completed, then:

a. Log in as sysadmin (the Data Domain default password is abc123) and enter

# system reboot

b. Answer yes to the Are you sure? prompt.

3. During bootup, the following message prints repeatedly on the console:

Press any key to continue

Within ten seconds, press and hold down the spacebar until the boot menu appears.

Caution: Do not press any other key, as unexpected behavior may result.

4. The boot menu appears.

Use the up arrow and down arrow keys to select the offline diagnostics option for the console interface being used, then press Enter.

Page 19: DD OS 4.9 Offline Diagnostics Suite User Guide · EMC Proprietary and Confidential - For Internal Use Only DD OS 4.9 Offline Diagnostics Suite User Guide This guide provides a troubleshooting

DD OS 4.9 Offline Diagnostics Suite User Guide 19

EMC Proprietary and Confidential - For Internal Use Only

5. If this message appears, the automatic model detection function was unable to identify the system model, or does not support it.

Dismissing the screen brings up a list of supported models. Use the up arrow and down arrow keys to select your model, then press Enter.

Caution: If your system is not on the list of supported models, do not select unknown. Contact Data Domain Support.

6. After automatic model detection or after you enter the correct model manually, the Diagnostics Main Menu appears.

Page 20: DD OS 4.9 Offline Diagnostics Suite User Guide · EMC Proprietary and Confidential - For Internal Use Only DD OS 4.9 Offline Diagnostics Suite User Guide This guide provides a troubleshooting

EMC Proprietary and Confidential - For Internal Use Only

20 DD OS 4.9 Offline Diagnostics Suite User Guide

Before and After Running Diagnostics

Inserting and Removing a USB Key for Writing Logs

To save log files to a USB key automatically following a diagnostics run, insert the USB key after the Main Menu appears, but before running diagnostics. Remove the USB key before exiting the Main Menu (and rebooting the system). The USB key is unmounted automatically.

Installing Loopback Cables (DD670 Only)

Loopback cables must be installed when you run the Network External Loopback Test. You can use the cables currently installed on the system, or equivalents. Cross-over cables are not necessary. Using the wrong cable will cause the test to fail.

Note: Before removing or changing any cable connections, note or mark the connector and port locations, so that you can easily restore them after running diagnostics.

Loopbacks can be implemented between built-in Ethernet ports, or between ports on the same Ethernet card (NIC) as shown in Figure 5. Refer to the Hardware Overview or Installation and Setup Guide for your system to obtain the location of built-in Ethernet ports and optional NIC slot assignments.

Note: Do not make a loopback from the maintenance port or single-port cards, which are not supported for loopback testing.

Connect the loopback cables as follows.

Port 1

Port 2

Loopback cable

Other NIC copper ports

Port 1 Port 2

Loopback cables for 2 or 4 ports

Port 3 Port 4

Other NIC fiber ports

Port 1 Port 2

Tx Rx Tx Rx

Loopback cables (pair)

Built-in Ethernet,Intel SFP+, Chelsio

• Built-in Ethernet (RJ45), Intel SFP+, or Chelsio: Connect the two ports.

• Other NIC copper ports (RJ45): Connect ports on the same card. (Rx and Tx are determined automatically.)

• Other NIC fiber ports: Each port has Tx and Rx connectors. Connect the Tx side of one port to the Rx side of the other port (on the same card).

Figure 5: Loopback Connections

Remove the loopback connections and restore the previous network cable connections before exiting the Main Menu (and rebooting the system). The presence of loopback cables does not affect running the Network Internal Loopback Test.

Go to “Run Diagnostics and Check Results” on page 21.

Page 21: DD OS 4.9 Offline Diagnostics Suite User Guide · EMC Proprietary and Confidential - For Internal Use Only DD OS 4.9 Offline Diagnostics Suite User Guide This guide provides a troubleshooting

DD OS 4.9 Offline Diagnostics Suite User Guide 21

EMC Proprietary and Confidential - For Internal Use Only

Run Diagnostics and Check Results

Navigating the Diagnostics Interface

Figure 6 shows the diagnostics menus and flow.

All testspassed

Boot Offline Diagnostics Suite

Diagnostics Main Menu

1. Run Automatic Diagnostics

2. Run Custom Diagnostics

3. Settings

4. ExitExit diagnostics

and reboot

Please Select Model

Select system model if autodetect is not correct

Diagnostic Selection Menu

Select diagnostics to run from the list of diagnostics available

for your model

Diagnostics Selected to Run

Confirm the diagnostics to run in this session and start test

execution

Diagnostics Run Screen

Appears when diagnostics are running. Reports status and

execution information

One or more tests failed

Diagnostic Recommendations Screen

Displays recommended service actions for all FRUs that failed diagnostics

Diagnostic Log Access Menu

Select logs from passed or failed diagnostics

Log files are written to the system disk and to a

USB key, if inserted.(Log files from previous runs are removed first.)

Test Log

View diagnostic logs (with recommended

service actions)

Exit

ExitExit

Figure 6: Diagnostic Menus and Flow

Page 22: DD OS 4.9 Offline Diagnostics Suite User Guide · EMC Proprietary and Confidential - For Internal Use Only DD OS 4.9 Offline Diagnostics Suite User Guide This guide provides a troubleshooting

EMC Proprietary and Confidential - For Internal Use Only

22 DD OS 4.9 Offline Diagnostics Suite User Guide

Special Keys

Table 10 lists special keys you can use in the diagnostics user interface.

Table 10: Special Keys for Diagnostics

Key Use

F1 Displays a help window for the current menu or screen.

H Displays help information for a highlighted item.

Enter Selects a highlighted item.

spacebar Toggles to enable or disable a highlighted item.

up arrow, down arrow

Moves highlighting up or down.

Page Up, Page Down

Scrolls up or down in window (also see b and f keys).

b, f Scrolls backward or forward in a window. If you are using a console application via a remote serial line, use the b and f keys to page up and down. Page Up and Page Down keys may return control to the Main Menu, and you are not able to view logs after that.

Esc Returns to the previous window, or skips to the initial Main Menu.

Ctrl+c Interrupts the diagnostic run. All diagnostics in the session are cancelled and no log files are written. Returns to Diagnostics Main Menu after a few seconds.

Ctrl+Alt+Delete Exits the diagnostics user interface and reboots the system.

Page 23: DD OS 4.9 Offline Diagnostics Suite User Guide · EMC Proprietary and Confidential - For Internal Use Only DD OS 4.9 Offline Diagnostics Suite User Guide This guide provides a troubleshooting

DD OS 4.9 Offline Diagnostics Suite User Guide 23

EMC Proprietary and Confidential - For Internal Use Only

Main Menu

The Main Menu appears after you boot offline diagnostics.

Time in offline diagnostics modeSystem model

Main Menu selections are:

• Run Automatic Diagnostics - Run all diagnostics that apply to your system. Test execution starts immediately and the Diagnostics Run screen appears (see page 25).

The following diagnostics have special setup requirements or run times of 1 hour or longer and are enabled for Run Automatic Diagnostics:

• Network External Loopback Test (requires installing loopback cables)• Memory Diagnostics (run time is 63 minutes)• HDD Comprehensive Test (run time is 60 minutes)These tests can be deselected by choosing Run Custom Diagnostics.

Note: If you select Run Automatic Diagnostics, the Network External Loopback Test is enabled by default. Before running automatic diagnostics, you must install loopback cables as described on page 20; otherwise the test will fail. If you do not want to run this test now, choose Run Custom Diagnostics and deselect the test.

• Run Custom Diagnostics - Select a subset of the diagnostics that apply to your system in the Diagnostic Selection Menu (see page 24).

Note: Before running the Network External Loopback Test, you must install loopback cables as described on page 20; otherwise the test fails.

• Settings - Specify your system’s model from a list of supported models if it is not identified correctly at top of the Main Menu (see page 19).

Caution: If your system is not on the list of supported models, do not select unknown. Contact Data Domain Support.

• Exit - Quit offline diagnostics and reboot the system (after confirming the exit).

To reboot the system to normal operation, press any key (except the spacebar) within ten seconds.

Notes:• If you inserted a USB key to store log files, remove it before exiting diagnostics.

• If you installed loopback cables for the Network External Loopback test, remove them and restore the previous network cable connections before exiting diagnostics.

Page 24: DD OS 4.9 Offline Diagnostics Suite User Guide · EMC Proprietary and Confidential - For Internal Use Only DD OS 4.9 Offline Diagnostics Suite User Guide This guide provides a troubleshooting

EMC Proprietary and Confidential - For Internal Use Only

24 DD OS 4.9 Offline Diagnostics Suite User Guide

Diagnostic Selection Menu

If you selected Run Custom Diagnostics in the Main Menu, the Diagnostic Selection Menu appears. This menu lists all applicable diagnostics for the system.

Estimated diagnostic runtimes

Diagnostic selected indicator

Estimated total runtime

Continue to the Diagnostics Run screen(page 25)

1. Select or deselect individual diagnostics or groups of diagnostics according to “Find the Problem Definition and Its Specified Diagnostics” on page 13.Highlight the test or group, then press the spacebar to select ([x]) or deselect ([ ]) it.

2. Highlight Run Selected Diagnostics, and press Enter to display the Diagnostics Selected to Run screen and confirm your choices.

3. Select Run in the Diagnostics Selected to Run screen and press Enter to begin test execution (see “Diagnostics Run Screen” on page 25).

Note: The diagnostics user interface displays the maximum possible runtime, including the time for the diagnostic to timeout. This is not the typical time.

Page 25: DD OS 4.9 Offline Diagnostics Suite User Guide · EMC Proprietary and Confidential - For Internal Use Only DD OS 4.9 Offline Diagnostics Suite User Guide This guide provides a troubleshooting

DD OS 4.9 Offline Diagnostics Suite User Guide 25

EMC Proprietary and Confidential - For Internal Use Only

Diagnostics Run Screen

The Diagnostics Run screen is active whenever diagnostics are executing.

Estimated total runtime remaining

Results of completed tests

Current test executing and time remaining

All tests passed

One or more testsfailedor

Continue to theDiagnostic Recommendations

screen and diagnostic logs(page 26)

Write logs and returnto the Main Menu

(page 23)

1. If all diagnostics pass, press Enter to automatically write test logs and return to the Main Menu.

2. If one or more diagnostics fail, press Enter to go to the Diagnostic Recommendations screen, which provides suggested service actions and access to diagnostic logs.

Note: Running Memory Diagnostics causes the system to automatically reboot or hang if sufficient memory is not available. This issue is better addressed in DD OS 5.0 offline diagnostics with the availability of the Examine System Inventory feature.

Page 26: DD OS 4.9 Offline Diagnostics Suite User Guide · EMC Proprietary and Confidential - For Internal Use Only DD OS 4.9 Offline Diagnostics Suite User Guide This guide provides a troubleshooting

EMC Proprietary and Confidential - For Internal Use Only

26 DD OS 4.9 Offline Diagnostics Suite User Guide

Perform the Recommended Service ActionsIf one or more diagnostics fail, the Diagnostic Recommendations screen displays service actions suggested by the failing diagnostics. Perform all recommended service actions before concluding that the FRU has failed.

Write logs and returnto the Main Menu

(page 23)

Go to the Diagnostic LogAccess Menu

(page 27)

or

Note: When using a console application via a remote serial line, use the b and f keys to page up and down. The Page Up and Page Down keys may return control to the Main Menu, and you will not be able to view logs after that.

Caution: Recommended service actions can include reseating and replacing cards and other components. These activities involve powering down the system and removing system covers for FRU access. Follow the FRU’s installation guide to perform these tasks. These guides are available from the Data Domain’s Support Portal at:

http://my.datadomain.com/US/en/parts.jsp

(Support login is required.)

Using Log Files

Most diagnostics contain several subtests that check different conditions and, upon failure, generate a different recommended service action for each. Recommended service actions appear on the console, but the conditions that generate them are recorded only in the full test log. Before performing the recommended service actions, you can display the full log file to get more information on the failing condition (see “View Log Files” on page 27). You can also view log files after exiting diagnostics (see “Get Log Information After Running Diagnostics” on page 29).

Page 27: DD OS 4.9 Offline Diagnostics Suite User Guide · EMC Proprietary and Confidential - For Internal Use Only DD OS 4.9 Offline Diagnostics Suite User Guide This guide provides a troubleshooting

DD OS 4.9 Offline Diagnostics Suite User Guide 27

EMC Proprietary and Confidential - For Internal Use Only

View Log FilesNote: When using a console application via a remote serial link, use the b and f keys to page up and down. The Page Up and Page Down keys may return control to the Main Menu, and you will not be able to view logs after that. See Table 10 on page 22 for keys to navigate menus and log files.

Diagnostic Log Access Menu

From the Diagnostic Log Access Menu, you can select and display logs for all tests run. If diagnostics are run more than once, all logs from the previous run are removed before the new logs are written.

Write logs and return to the Main Menu

(page 23)

Go back to the DiagnosticRecommendations screen

(page 26)

Display selectedtest log

Return to the Diagnostic Log Access Menu

or or

Page 28: DD OS 4.9 Offline Diagnostics Suite User Guide · EMC Proprietary and Confidential - For Internal Use Only DD OS 4.9 Offline Diagnostics Suite User Guide This guide provides a troubleshooting

EMC Proprietary and Confidential - For Internal Use Only

28 DD OS 4.9 Offline Diagnostics Suite User Guide

For example log file listings with annotations, see:

• “Example diag_log.sub File” on page 30

• “Example USB Key Log File” on page 32

Save Logs and Exit DiagnosticsSaving Test Log Files to the System Controller Disk and a USB KeyExiting from the following windows results in log files being written to the system boot disk and to an external USB key (is one is inserted):

• Diagnostics Run screen with All tests PASSED (select OK, then press Enter)

• Diagnostic Recommendations screen (press the Esc key or the Q key)

• Diagnostic Log Access Menu (select Return to Main Menu, then press Enter)

Status screens appear while the required drivers are loaded and logs are written, then control is returned to the Main Menu.

Note: If a core dump occurs while you are running offline diagnostics, the core file is written to the SUB area for analysis by Data Domain Support. Core files are not saved to USB keys.

Saving Logs to a USB Key OnlyLogs cannot be saved to the system controller boot disk if:

• Disk hardware is not functional.

• The disk has been swapped out (serial number mismatch with system).

• DD OS 4.9 is not installed.

If logs cannot be saved to disk and no USB key is present, you are prompted to insert a USB key. You can use any FAT32 (Unix VFAT) formatted USB key with at least 10 MB of free space. If you do not insert a USB key, no diagnostic logs are saved.

Note the following messages:• No USB key present indicates that the inserted key is defective or not formatted.

Verify formatting and try a different port or a different USB key.

• Could not write log files to USB key indicates that the key has less than 10 MB of free space.

After logs are successfully saved to the USB key, you are returned to the Main Menu.

Remove the USB key before exiting the Main Menu (and rebooting the system).

Page 29: DD OS 4.9 Offline Diagnostics Suite User Guide · EMC Proprietary and Confidential - For Internal Use Only DD OS 4.9 Offline Diagnostics Suite User Guide This guide provides a troubleshooting

DD OS 4.9 Offline Diagnostics Suite User Guide 29

EMC Proprietary and Confidential - For Internal Use Only

Get Log Information After Running DiagnosticsAfter running diagnostics, ASCII-format logs are:

• Concatenated and written to a single diag_log.sub file on the system disk

• Written as separate files to a USB key (if one is inserted)

Log File on the System DiskIf you are able to reboot the system to online mode, you can get test logs for the last diagnostics run. These are concatenated into single file on the system boot disk:

/ddr/var/log/debug/platform/diag_log.sub

Viewing Logs on the System Disk

Offline diagnostics logs are saved to the system boot disk and are readable using the DD OS log view command:

# log view debug/platform/diag_log.sub

Page 30: DD OS 4.9 Offline Diagnostics Suite User Guide · EMC Proprietary and Confidential - For Internal Use Only DD OS 4.9 Offline Diagnostics Suite User Guide This guide provides a troubleshooting

EMC Proprietary and Confidential - For Internal Use Only

30 DD OS 4.9 Offline Diagnostics Suite User Guide

Example diag_log.sub File

Figure 7 shows a diag_log.sub file containing an NVRAM test log and a framework log that lists the sequence of diagnostic tests and indicates which tests were run. Because all diagnostic logs are concatenated into a single file, the left column identifies the test log and the right column contains log information.

File created at Mon Aug 22 15:30:16 PDT 2011...[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] [START] Timestamp: Mon Aug 22 22:19:07 2011[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] Diagnostic STDOUT and STDERR log for 'NVRAM Card Test' with switches '-e'[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] (DDOS Version: 4.9.3.1-245662) (pid: 10603) (sequence number: 2) (iteration: 1) (list element: 1)[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] ******************************[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] Testing /dev/umema[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] ******************************[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] Performing ECC and Battery test on /dev/umema partition 2.[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] Performing PRE read ECC[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] Passed /dev/umema2 ECC check OK[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] [STDERR]67521+0 records in[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] 67521+0 records out[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] Passed /dev/umema2 read only test.[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] Performing POST read ECC check[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] Passed /dev/umema2 ECC check OK[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] Performing battery check[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] Size: 1.024 MBytes[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] Number of batteries: 3[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] [0]: Status OK[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] Charge in milli-Volts: 4099[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] Charge percent: 100.09[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] [1]: Status OK[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] Charge in milli-Volts: 4099[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] Charge percent: 100.09[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] [2]: Status OK[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] Charge in milli-Volts: 4099[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] Charge percent: 100.09[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] Passed /dev/umema2 battery check 3 batteries OK[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] PASSED ECC and Battery test on /dev/umema partition 2.[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] Performing ECC and Battery test on /dev/umema partition 3.[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] Performing PRE read ECC[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] Passed /dev/umema3 ECC check OK[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] [STDERR]2025472+0 records in[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] 2025472+0 records out[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] Passed /dev/umema3 read only test.[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] Performing POST read ECC check[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] Failed /dev/umema3 ECC check. PCI errors 0 memory errors 1 detected.

Name of diagnostic log Log information

Note: Timestamps in the diag_log.sub file use the UTC time standard that is independent of time zone and daylight savings time offsets.

Figure 7: NVRAM Test and Framework Logs in diag_log.sub File (1 of 2)

Page 31: DD OS 4.9 Offline Diagnostics Suite User Guide · EMC Proprietary and Confidential - For Internal Use Only DD OS 4.9 Offline Diagnostics Suite User Guide This guide provides a troubleshooting

DD OS 4.9 Offline Diagnostics Suite User Guide 31

EMC Proprietary and Confidential - For Internal Use Only

Figure 7: NVRAM Test and Framework Logs in diag_log.sub File (2 of 2)

[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] RECOMMENDATIONS_START [/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] Failed /dev/umema3 ECC check. PCI errors 0 memory errors 1 detected.[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] Replace the NVRAM card.[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] RECOMMENDATIONS_END ...[/tmp/ddod/framework.log] Diagnostic sequence starting...[/tmp/ddod/framework.log] 1: Skipping diagnostic 'Memory Diagnostics'[/tmp/ddod/framework.log] 2: (1:1) Running diagnostic 'NVRAM Card Test' with switches '-e'[/tmp/ddod/framework.log] 2: (1:1) Run diagnostic 'NVRAM Card Test' with switches '-e' on system-selected default CPU.[/tmp/ddod/framework.log] 2: FAILED: 'NVRAM Card Test' returned status 1[/tmp/ddod/framework.log] 3: Skipping diagnostic 'CPU Test'[/tmp/ddod/framework.log] 4: Skipping diagnostic 'CPU MCE Test'[/tmp/ddod/framework.log] 5: Skipping diagnostic 'Motherboard PCIe Topology Test'[/tmp/ddod/framework.log] 6: Skipping diagnostic 'Network Internal Loopback Test'[/tmp/ddod/framework.log] 7: Skipping diagnostic 'Network External Loopback Test'[/tmp/ddod/framework.log] 8: Skipping diagnostic 'HDD Quick Test'[/tmp/ddod/framework.log] 9: Skipping diagnostic 'HDD Comprehensive Test'[/tmp/ddod/framework.log] 10: Skipping diagnostic 'SAS Diagnostics Test'[/tmp/ddod/framework.log] Diagnostic sequence completed...

Name of diagnostic log Log information

Page 32: DD OS 4.9 Offline Diagnostics Suite User Guide · EMC Proprietary and Confidential - For Internal Use Only DD OS 4.9 Offline Diagnostics Suite User Guide This guide provides a troubleshooting

EMC Proprietary and Confidential - For Internal Use Only

32 DD OS 4.9 Offline Diagnostics Suite User Guide

Log Files on a USB Key

Viewing Logs on a USB Key

Log files are written to a USB key, if inserted, when you return to the Main Menu after running diagnostics or when you select Save Diagnostic Logs to USB Key on the Main Menu. These logs are displayed when running Offline Diagnostics.

The parent log directory /diag_logs is created off the USB root and a subdirectory /<log-mm-dd-hh-mm> is created (where mm = month, dd = day, hh = hour, and mm = minute the logs were saved). Individual diagnostic logs and a log of the diagnostic flow are saved in this subdirectory. These logs are saved in ASCII format for viewing on any Linux or Windows machine.

Note: Timestamps in USB log files use the UTC time standard that is independent of time zone and daylight savings time offsets.

Example USB Key Log File

Figure 8 shows subtest results and recommended service actions for an NVRAM card.

...Performing ECC and Battery test on /dev/umemb partition entire device. Performing PRE read ECC Passed /dev/umemb ECC check OK[STDERR]2097152+0 records in2097152+0 records out Passed /dev/umemb read only test. Performing POST read ECC check Passed /dev/umemb ECC check OK

Performing battery checkSize: 1.024 MBytes1

Number of batteries: 3 [0]: Status DISCHARGED Charge in milli-Volts: 3863 Charge percent: 0.03 [1]: Status OK Charge in milli-Volts: 4150 Charge percent: 100.00 [2]: Status OK Charge in milli-Volts: 4150 Charge percent: 100.00 Failed /dev/umemb battery check 3 batteries found, 2 were OK. RECOMMENDATIONS_START Leave DDR as is in this Offline Diagnostic mode for at least 3 hours to allow battery to charge. Then reboot the system and re-run this diagnostic. If this failure persists then replace the NVRAM card. RECOMMENDATIONS_END...

1. The correct value is 1.024 GBytes.

Recommended service actions

Battery check subtest status

PRE read ECC subtest status

POST read ECC subtest status

Figure 8: NVRAM Test Log File on a USB Key

Page 33: DD OS 4.9 Offline Diagnostics Suite User Guide · EMC Proprietary and Confidential - For Internal Use Only DD OS 4.9 Offline Diagnostics Suite User Guide This guide provides a troubleshooting

DD OS 4.9 Offline Diagnostics Suite User Guide 33

EMC Proprietary and Confidential - For Internal Use Only

Diagnostic Test DescriptionsTable 11 provides additional information on diagnostic test coverage.

Table 11: Offline Diagnostic Test Coverage

Test Name Coverage

CPU MCE Test Decodes and prints the stored machine check record generated by a machine check event.

• Most errors can be corrected by the CPU using internal error correction mechanisms. Uncorrected errors cause machine check exceptions that may panic the system.

• The MCE error condition displayed by the test unambiguously identifies the faulty component.

CPU Test Tests the processor’s ability to perform a Compress–Uncompress–Compare operation sequence.

• An MD5 fingerprint is generated for the data file before it is compressed. The compressed file is then uncompressed and its MD5 fingerprint is compared against the one generated for the original file.

FC Diagnostic Tests the Fibre Channel (FC) devices attached to FC cards.

Note: This applies to DD880g only; VTL is not supported.

• Verifies that Logical Unit Numbers (LUNs) are detected and configured correctly.

• Performs IOs to test the physical interface (path to storage).

• Directs LUNs to perform a read IO, then reads from the LUNs.

HDD Comprehensive Test Tests all system controller disks.

• Reads disk sectors and their SMART data.

HDD Quick Test Tests the system controller boot disk only.

• Reads disk sectors and their SMART data.

Memory Diagnostics Tests available free memory and reports any ECC errors (correctable and uncorrectable) detected by hardware.

• Identifies and reports and reports a failing DIMM or DIMM pair.

Motherboard PCIe Topology Test Checks that the detected PCIe IO topology matches what the system was designed for.

• PCIe interconnect tests exercise and verify the PCIe subsystem. They ensure the connectivity for all the PCIe-based controllers and other IO targets present on the motherboard.

• The tests scan the entire PCIe fabric, starting from the PCIe root and looking for expected PCIe topology. The tests indicate errors if an expected device or set of devices are absent.

Page 34: DD OS 4.9 Offline Diagnostics Suite User Guide · EMC Proprietary and Confidential - For Internal Use Only DD OS 4.9 Offline Diagnostics Suite User Guide This guide provides a troubleshooting

EMC Proprietary and Confidential - For Internal Use Only

34 DD OS 4.9 Offline Diagnostics Suite User Guide

Network External Loopback Test Tests if the network controller’s data path is functional through the NIC Tx and Rx ports. Built-in Ethernet and dual- and quad-port NICS can be tested; single-port NICs cannot be tested.

This test generates and sends out packets, and expects to receive the same number of packets.

Note: Before you run the Network External Loopback Test, loopback cables must be installed. For instructions, see “Installing Loopback Cables (DD670 Only)” on page 20. Installing loopback cables for the Network External Loopback Test does not affect running the Network Internal Loopback Test.

Network Internal Loopback Test Tests if the network controller’s data path is functional to the MAC layer. Loopback is through the internal loopback interface (MAC layer); packets do not leave the controller.

This test generates and sends out packets and expects to receive the same number of packets.

NVRAM Card Test Performs the following tests and checks:

• Tests all partitions of NV memory on one or multiple NVRAM cards. Scans through the entire range to NV memory in each partition.

• Memory ECC.

• Battery status.

• Battery state (enabled/disabled).

SAS Diagnostics Test Tests the SAS topology to determine the reliability of the connections.

• Pinpoints SAS connectivity problems to specific links.

• Tests each external SAS Host Bus Adapter (HBA) port and attached shelf chain.

• Generates read traffic to find errors that only occur under load.

Table 11: Offline Diagnostic Test Coverage (Continued)

Test Name Coverage