debugging core files crash dumps unix linux

115
1. Debug Process and Analyze Process Core on Solaris........2 1.1 General Commands.............................................2 1.2 Analyze Process Core by Using mdb (Modular Debugger).........4 1.3 Analyze Process Core by Using adb on Solaris.................4 1.4 Analyze Process Core by Using gdb on Solaris.................5 1.5 Debug Process and Analyze Core by Using truss on Solaris.....7 1.6 Analyze Process by using dbx.................................9 1.8 Create Process Core.........................................11 1.9 Examining Memory Address Spaces with mdb on Solaris.........11 1.10 Debug Kernel, System Calls and Processes (DTRACE)..........11 1.11 Other Debugging Tools on Solaris...........................16 2. Debug Process and Analyze Process Core on Linux.........18 2.1 Debug Processes by using STRACE.............................18 2.2 Analyze a Process Core Dump with gdb on Linux...............19 2.3 Analyze the Process Core using dbx on Linux.................21 2.4 Analyze a Core Dump Using Oprofile on Linux.................22 2.5 Debug Libraries and Symbols on Linux........................23 2.6 Other Debugging Tools on Linux..............................23 3. Debug Process and Analyze Process Core on HP-UX.........25 3.1 Debug Processes by using tusc...............................25 3.2 Instaling tusc on HP-UX 11.xx...............................28 3.3 Debug Processes and Core Files by using HP WDB / GDB........29 3.4 Debug Processes by using truss..............................34 3.5 Anlalyze Process Performance by using Caliper on HP-UX 11.xx 35 3.6 Analyze Process Performance by using Prospect on HP-UX 11.xx 37 3.7 Live Memory Analysis on HP-UX 11.xx by using KWDB...........40 3.8 Other Debugging Tools on HP-UX 11.xx........................41 4. Debug Process and Analyze Process Core on IBM AIX.......43 4.1 Debug Processes by using proctools..........................43 4.2 Debug Processes by using trace..............................43 4.3 Debug Processes by using syscalls...........................45 4.4 Debug Processes by using watch..............................46 4.5 Debug Processes by using ProbeVue...........................46 4.6 Debug Processes by using truss..............................47 4.7 Debug Processes by using dbx................................47 4.8 Analyze a Processes Core by using KDB.......................47 4.9 Other Debugging Tool on IBM AIX.............................47 5. Debug Process and Analyze Process Core on IRIX..........49 6. Debug Process and Analyze Process Core on Tru64.........49 7. Generate / Analyze a Crash Dump on Solaris..............50 7.1 Save a Crash Dump on a Panic’d System.......................50 7.2 Setup a System to Save a Crash Dump.........................51 7.3 Crash Dump Analysis on Solaris by using MDB.................52

Upload: millajovavich

Post on 08-Nov-2014

227 views

Category:

Documents


16 download

DESCRIPTION

debugging core crash dump

TRANSCRIPT

Page 1: Debugging Core Files Crash Dumps UNIX Linux

1. Debug Process and Analyze Process Core on Solaris.................................................21.1 General Commands............................................................................................................................................21.2 Analyze Process Core by Using mdb (Modular Debugger)...............................................................41.3 Analyze Process Core by Using adb on Solaris.......................................................................................41.4 Analyze Process Core by Using gdb on Solaris.......................................................................................51.5 Debug Process and Analyze Core by Using truss on Solaris............................................................71.6 Analyze Process by using dbx........................................................................................................................91.8 Create Process Core........................................................................................................................................ 111.9 Examining Memory Address Spaces with mdb on Solaris.............................................................111.10 Debug Kernel, System Calls and Processes (DTRACE)..................................................................111.11 Other Debugging Tools on Solaris..........................................................................................................16

2. Debug Process and Analyze Process Core on Linux..................................................182.1 Debug Processes by using STRACE..........................................................................................................182.2 Analyze a Process Core Dump with gdb on Linux..............................................................................192.3 Analyze the Process Core using dbx on Linux.....................................................................................212.4 Analyze a Core Dump Using Oprofile on Linux...................................................................................222.5 Debug Libraries and Symbols on Linux..................................................................................................232.6 Other Debugging Tools on Linux...............................................................................................................23

3. Debug Process and Analyze Process Core on HP-UX................................................253.1 Debug Processes by using tusc...................................................................................................................253.2 Instaling tusc on HP-UX 11.xx.....................................................................................................................283.3 Debug Processes and Core Files by using HP WDB / GDB.............................................................293.4 Debug Processes by using truss.................................................................................................................343.5 Anlalyze Process Performance by using Caliper on HP-UX 11.xx...............................................353.6 Analyze Process Performance by using Prospect on HP-UX 11.xx.............................................373.7 Live Memory Analysis on HP-UX 11.xx by using KWDB.................................................................403.8 Other Debugging Tools on HP-UX 11.xx.................................................................................................41

4. Debug Process and Analyze Process Core on IBM AIX.............................................434.1 Debug Processes by using proctools........................................................................................................434.2 Debug Processes by using trace.................................................................................................................434.3 Debug Processes by using syscalls...........................................................................................................454.4 Debug Processes by using watch...............................................................................................................464.5 Debug Processes by using ProbeVue.......................................................................................................464.6 Debug Processes by using truss.................................................................................................................474.7 Debug Processes by using dbx....................................................................................................................474.8 Analyze a Processes Core by using KDB.................................................................................................474.9 Other Debugging Tool on IBM AIX............................................................................................................47

5. Debug Process and Analyze Process Core on IRIX....................................................49

6. Debug Process and Analyze Process Core on Tru64.................................................49

7. Generate / Analyze a Crash Dump on Solaris..........................................................507.1 Save a Crash Dump on a Panic’d System................................................................................................507.2 Setup a System to Save a Crash Dump....................................................................................................517.3 Crash Dump Analysis on Solaris by using MDB..................................................................................527.4 Service Tool Bundle Service Crash Analysis Tool..............................................................................557.5 Crash Dump Analysis on Solaris by using ADB...................................................................................567.6 Crash Dump Analysis on Solaris by using Crash................................................................................587.7 Crash Dump Analysis on Solaris by using ACT....................................................................................607.8 Other Crash Dump Analysis Tools on Solaris......................................................................................60

8. Generate / Analyze a Crash Dump on HP-UX..........................................................61

Page 2: Debugging Core Files Crash Dumps UNIX Linux

8.1 Crash Dump Analysis by using KWDB....................................................................................................618.2 Remote Crash Dump Analysis.....................................................................................................................658.3 Crash Dump Analysis by using Q4............................................................................................................658.4 Crash Dump Analysis by using KWDB Q4 Mode.................................................................................688.5 Crash Dump Analysis by using HP WDB / GDB..................................................................................738.6 Crash Dump Analysis by using adb..........................................................................................................74

9. Generate / Analyze a Crash Dump on Linux............................................................759.1 Enable Saving Crash Dump by using kexex-tools..............................................................................759.2 Symulate a Panic and Save a Crash Dump.............................................................................................769.3 Analyze Crash Dump by using crash........................................................................................................779.4 Analyze Crash Dump by using GDB..........................................................................................................789.5 Analyze Crash Dump by using LKCD.......................................................................................................809.6 Other Useful Commands...............................................................................................................................84

10. Generate / Analyze a Crash Dump on Linux..........................................................8410.1 Setup and Enable KDB.................................................................................................................................8410.2 Analyze a Crash Dump by using KDB....................................................................................................85

11. Debugging Tools...................................................................................................8811.1 Informations....................................................................................................................................................89

1. Debug Process and Analyze Process Core on Solaris

1.1 General CommandsShow Process Tracebacks:pstack core

Show Process Tracebacks on Running Process:pstack process_id

Show Process Threads Info:pflags core

Show Process Memory Mapping:pmap core

Show Process Memory Mapping for a Running Process:pmap -sx `pgrep testprog`

Show Kernel Info:kstat -n system_misc

Check System Pages:kstat -n system_pages

Check Processes:prstat -Lmc 10 10 > prstat.outmore prstat.out

Page 3: Debugging Core Files Crash Dumps UNIX Linux

Debug Processes:pargs corepcred $$pldd $$psig $$pfiles $$pfile pidpstop $$prun corepwait pidptree $$ptree pidptime corepwdx $$preap* corepgrep -u rmc

Kernel Lock Statistics (Use -i 971 as Interval to Avoid Collisions with the Clock Interrupt and Gather Fine-Grained Data):lockstat -i 971 sleep 300 > lockstat.outlockstat -i 971 -I sleep 300 > lockstatI.out

Kernel Profiling:lockstat -Ikw i997 sleep 10

CPU Traps Statistics:trapstat -t

Gather CPU Hardware Counters per Process:cputrack -N 20 -c pic0=DC_access,pic1=DC_miss -p 19849bc -l

Gather CPU Statistics:cpustat -c pic0=Cycle_cnt,pic1=DTLB_miss 1

Check Page Size:pagesize -a

Set Page Size Preference:ppgsz -o heap=4M ./testprog

Segmap Hit Rates Statistics:kstat -n segmap

Dump ELF File:elfdump -e /bin/ls

Dump Section Headers:elfdump -c /bin/ls

Invoke the Runtime Linker on the Specified Binary File to Check which Libraries are Linked to it:ldd netstat

Run pled on Running Processes:

Page 4: Debugging Core Files Crash Dumps UNIX Linux

pldd $$

Get Linked List of All Processes:kstat -n varmdb -k> max_nprocs/D

Library Tracing:apptrace ls

Check Scheduling Classes:dispadmin -lpriocntl -l

Check Scheduling Class and Threads Priority:ps -eLc

Check Timeshare Dispatch Table:dispadmin -g -c TS

1.2 Analyze Process Core by Using mdb (Modular Debugger)

mdb executable_name core_name$C$q

OR:mdb core::statusdata::files::stack::walkers::dcmds -l::cpu0::print cpu_t::walk walk_name | ::dcmd::walk cpu|::print cpu_t::cpu_t::sizeof::address::list

OR:mdb –k

1.3 Analyze Process Core by Using adb on Solaris

Invoke the debugger:adb -c core

Display the message buffer:$<msgbuf

Page 5: Debugging Core Files Crash Dumps UNIX Linux

Get the thread list:$< threadlist

Check the status:$>status

Get the process crash time:$>time/Y

Get the kernel memory structures:$> kmastat

Quit the debugger:$>q

1.4 Analyze Process Core by Using gdb on Solaris

The GNU Debugger is a powerful debugger developed for the main operating systems.In the most recent Solaris versions, the GDB is shipped with the installation media.You can find the here the current release.

Start gdb on core file:gdb -c core

OR:gdb a.out core

OR:gdb path/to/the/binary path/to/the/core

OR by gdb Prompt:(gdb) core core

If the executable path is not provided, the debugger selects the invocation path of the process that generated the core file. The invocation path information is stored in the core file. If the invocation path is a relative path, you must enter the executable while debugging the core file.

To start debugging the process, at the gdb Prompt, invoke the core file:(gdb) core core

Check Status:(gdb) status

View Data:(gdb) data

View Stacks:(gdb) stack

Analyze a Stack by its Number:(gdb) frame number

Page 6: Debugging Core Files Crash Dumps UNIX Linux

View Code around that Stack:(gdb) list

List Variables:(gdb) info locals

View Files:(gdb) files

View Internals:(gdb) internals

View Command Aliases:(gdb) aliases

Check Support Facilities:(gdb) support

Running Program:(gdb) running

Analyze Tracing of Program Execution without Stopping it:(gdb) tracepoints

User-defined Commands:(gdb) user-defined

Get Obscure Features:(gdb) obscure

View the stack trace:(gdb) backtrace

List all threads of the process at the time of the crash:(gdb) info thread

View the specified thread:(gdb) thread thread_id

Disassembly a specified section in the memory:(gdb) disassemble <address>

Displays memory information of a specified address:(gdb) x / s <address>

Display the contents the registers:(gdb) info registers

Display all registers, including floating point registers:(gdb) i all

Display informations about all of the shared libraries:(gdb) info shared

Page 7: Debugging Core Files Crash Dumps UNIX Linux

Prints the target that is currently under the debugger:(gdb) info files(gdb) info target

To view the source code:(gdb) list

Start the target program:(gdb) run

Set a breakpoint:(gdb) break sum

On a line number:(gdb) b 25

On a n offset on the current line:(gdb) b +9(gdb) b -1

On a memory address (use *):(gdb) b *00x2324

Set a watchpoint on a variable or expression:(gdb) watch x(gdb) watch_target &x(gdb) watch_target (<type of x> *) *<addr of x>

Display a list of breakpoints and watchpoints:(gdb) info break(gdb) info watch

Help:(gdb) help

1.5 Debug Process and Analyze Core by Using truss on Solaris

Trace System Calls of a Process or Command:truss -p pidtruss -p 2975/3truss /usr/local/sbin/snmpd

Trace System Calls, Faults and Signals of a Process or Command and Count them:truss -c -p pid

Trace a Process, Follow its Children and Count Syscalls, Faults and Signal:truss -cf -p pid

Trace System Call, its Environment Strings and Timestamp for a Process (and Put it on a File):truss -d -e -p 1873truss -d -e -f -o /tmp/dbstart.lst -p 2522

Page 8: Debugging Core Files Crash Dumps UNIX Linux

Trace System Calls of a Process and Include a Time Delta on Each Line of Trace Ouput:truss -d -D -p 1473

Trace a Process Including Timestamp on Each Line and Include / Exclude Specific System Calls (in this case “read” Syscalls):truss -d -t read -p 1468truss -d -t !read -p 1468

Trace a “find” and put the output on a file:truss find . -print >find.out

Trace of the “open”, “close”, “read”, and “write” System Calls:truss -t open,close,read,write find . -print >find.out

Trace a Shell Script:truss -f -o truss.out spell document

Abbreviating Output:truss nroff -mm document >nroff.out

Because 97% of the output reports lseek(), read(), and write() system calls, to abbreviate it:truss -t !lseek,read,write nroff -mm document >nroff.out

Tracing library calls from within the C library:truss -u libc

Trace all user-level calls made to any library other than the C library:truss -u '*' -u !libc –p 1544

Tracing all user-level printf and scanf function calls in the C library:truss -u 'libc:*printf,*scanf' –p 1100

Trace every user-level function call from any-where to anywhere:truss -u a.out -u ld:: -u :: ...

Trace the system call activity of process #1, init:truss -p -v all 1

Trace a Process exec() Syscalls and Follow its Children:truss -ftexec -p pid 2> /dev/null &

Trace System Calls of an Oracle Listener and its Timestamp and Put the Output in the File “lsnrctl.truss”:truss -d -o lsnrctl.truss -p 3949

Trace All of the System Calls of the “pgrep” command in a File:truss -o /var/tmp/syslog.truss.out -sall -p `pgrep syslogd`

Trace the System Calls and its Forks, show arguments passed to the exec calls, and the environment variables:truss –aef -p <PID>

Page 9: Debugging Core Files Crash Dumps UNIX Linux

OR:truss -aef lsnrctl dbsnmp_start

Trace the System Calls and its Forks, show arguments passed to the exec calls, the environment variables, and the full contents of the I/O buffer for each read() and write() on any of the specified file descriptors and follow its children:truss –aef -rall -wall -p <PID>

Trace the full contents of the I/O buffer for each read() and write() on any of the specified file descriptors and follow its children:truss -rall -wall -f -p <PID>

Trace verbosely the full contents of the I/O buffer for each read() and write() on any of the specified file descriptors and follow its children:truss -wall -rall -vall -f /usr/local/sbin/snmpd

Verbosely Trace init:truss -p -v all 1

Trace the Machine Faults:truss –mall –p 1200

Exclude the Machine Faults from the Trace:truss –m!all –p 1200

Machine Faults that Stops the Process (If one of the specified faults is incurred, truss leaves the process stopped and abandoned):truss –Mall –p 1200

Run truss to Debug read() and write() syscalls as Oracle Listener/DBSnmp Starts:truss -rall -wall lsnrctl start

Count Total CPU Seconds per System Calls:truss -c dd if=500m of=/dev/null bs=16k count=2k

OR:truss -d -u a.out,libc dd if=500m of=/dev/null bs=16k count=2kmore a.out

Trace allthe syscalls, threads and API functions for CORBA-based process:truss -t!all -s!all -u libit_*::CORBA* -p 21922

1.6 Analyze Process by using dbx

To invoke dbx:dbx program_name

OR:dbx pid

OR:

Page 10: Debugging Core Files Crash Dumps UNIX Linux

dbx –a pid

OR:dbx -d 100 program_name core_file

OR:dbx -d 100 -a pid

OR:dbx - `pgrep Freeway`

At dbx Prompt:(dbx) run(dbx) where(dbx) status

Analyze Process Core by Using dbx:dbx program_name core

OR:dbx - core

OR:dbx a.out core

At dbx Prompt:(dbx) run(dbx) where(dbx) threads(dbx) status(dbx) list main (dbx) print msg(dbx) check -access(dbx) check -memuse(dbx) help(dbx) quit

1.7 Generate a Process Core Dump on Solaris

coreadm

OR:savecore -d

If after enabling core file generation your system still does not create a core file, you may need to change the file-size writing limits set by your operating system:ulimit -aulimit -c unlimitedulimit -H -c unlimited

Enable Applications to Generate Core Files:coreadm -g /path-to-file/%f.%n.%p.core -e global -e process -e global-setid -e proc-setid -e log

Page 11: Debugging Core Files Crash Dumps UNIX Linux

1.8 Create Process Core

echo lsgcore ls

1.9 Examining Memory Address Spaces with mdb on Solaris

prstattopps -ef | grep pidpmap -x 919mdb -k

Load the dmod containing the new dcmd:::load /wd320/max/source/mdb/segpages/i386/segpages.so

Walk through the Segments of the Process Address Space, showing Each Virtual Page in the Segment:0t919::pid2proc | ::print proc_t p_as | ::walk seg | ::segpages

Count the Pages currently Valid for the Process:0t919::pid2proc | ::print proc_t p_as | ::walk seg | ::segpages !grep -i " valid" | wc

Count the Pages in Memory Not currently Valid in the Page Table(s) for the Process:0t919::pid2proc | ::print proc_t p_as | ::walk seg | ::segpages !egrep -i "inmemory" | wc

How Many Pages are Currently Not Valid (and Not in Memory):0t919::pid2proc | ::print proc_t p_as | ::walk seg | ::segpages !egrep -i " invalid$" | wc

How Large is the Address Space (this should be the total size as reported by pmap):0t919::pid2proc | ::print proc_t p_as | ::walk seg | ::segpages !egrep -v OFFSET | wc

How Many Pages have been Swapped Out:0t919::pid2proc | ::print proc_t p_as | ::walk seg | ::segpages !grep -i swapped | wcpmap -x 919

1.10 Debug Kernel, System Calls and Processes (DTRACE)

dtrace -f /usr/local/sbin/snmpddtrace -l -n tcp::entrydtrace -l -m tcpdtrace -lv -n fbt:tcp:_info:entrydtrace -n 'ufs_read:entry { printf("%s\n",stringof(args[0]->v_path));}'

Get wich Process is making more SysCalls:dtrace -n 'syscall:::entry { @[execname] = count(); }'

Page 12: Debugging Core Files Crash Dumps UNIX Linux

OR:dtrace -n 'syscall::read:entry { @[execname,pid]=count()}'

Get new Processes with Arguments:dtrace -n 'proc:::exec-success { trace(curpsinfo->pr_psargs); }'

Files opened by process:dtrace -n 'syscall::open*:entry { printf("%s %s",execname,copyinstr(arg0)); }' Pages paged in by process:dtrace -n 'vminfo:::pgpgin { @pg[execname] = sum(arg0); }' Minor faults by process:dtrace -n 'vminfo:::as_fault { @mem[execname] = sum(arg0); }'

System Calls Count by Name:dtrace -n 'syscall:::entry { @syscalls[probefunc] = count(); }'

Syscall Count by Program:dtrace -n 'syscall:::entry { @num[execname] = count(); }' Syscall Count by Syscall:dtrace -n 'syscall:::entry { @num[probefunc] = count(); }' Syscall Count by Process:dtrace -n 'syscall:::entry { @num[pid,execname] = count(); }'

Syscalls by Type:dtrace -n 'syscall:::entry { @[probefunc] = count(); }'

Match the syscall probe only when the execname matches our investigation target, filebench, and count the syscall name:dtrace -n 'syscall:::entry /execname == "filebench"/ { @[probefunc] = count(); }'

Kernel:Kernel Profiling:dtrace -n 'profile-997ms / arg0 != 0 / { @ks[stack()]=count() }'

Counting xcalls:dtrace -n 'xcalls { @[probefunc] = count() }'

Probe Virtual Memory Info on Running Staroffice Process:dtrace -P vminfo/execname == "soffice.bin"/{@[probename] = count()}dtrace -s ./soffice.d

Successful Signal Details:dtrace -n 'proc:::signal-send /pid/ { printf("%s -%d %d",execname,args[2],args[1]->pr_pid); }'

Kernel stack trace profile at 1001 Hertz:dtrace -n 'profile-1001 { @[stack()] = count(); }'

Thread off-cpu stack trace count:dtrace -n 'sched:::off-cpu { @[stack()] = count(); }'

Page 13: Debugging Core Files Crash Dumps UNIX Linux

Adaptive lock block time totals (ns) by kernel stack trace:dtrace -qn 'lockstat:::adaptive-block { @[stack(5), "^^^ total ns:"] = sum(arg1); }'

Kernel function call counts for module "zfs" by module:dtrace -n 'fbt:zfs::entry { @[probefunc] = count(); }'

Kernel function call counts for functions beginning with "hfs_" by module:dtrace -n 'fbt::hfs_*:entry { @[probefunc] = count(); }'

Kernel stack back trace counts for calls to function "arc_read()" (for example):dtrace -n 'fbt::arc_read:entry { @[stack()] = count(); }'

Identify kernel stacks calling disk I/O:dtrace -n 'io:::start { @[stack()] = count(); }'

Trace errors along with disk and error number:dtrace -n 'io:::done /args[0]->b_flags & B_ERROR/ { printf("%s err: %d", args[1]->dev_statname, args[0]->b_error); }'

Look at what is calling semsys:dtrace -n 'syscall::semsys:entry /execname == "filebench"/ { @[ustack()] = count();}'

Probe Functions:dtrace -n 'syscall:::entry { @scalls[probefunc] = count() }'

Check which Process is Creating Threads:dtrace -n 'thread_create:entry { @[execname]=count()}'

CPU:What are the top user functions running on CPU (% usr time)?dtrace -n 'profile-997hz /arg1/ { @[execname, ufunc(arg1)] = count(); }'

What are the top 5 kernel stack traces on CPU (shows why)?dtrace -n 'profile-997hz { @[stack()] = count(); } END { trunc(@, 5); }'

What threads are on CPU, counted by their thread name? (FreeBSD)dtrace -n 'profile-997 { @[stringof(curthread->td_name)] = count(); }'

What system calls are being executed by the CPUs?dtrace -n 'syscall:::entry { @[probefunc] = count(); }'

Which processes are executing the most system calls?dtrace -n 'syscall:::entry { @[pid, execname] = count(); }'

Get Interrupts by CPU:dtrace -n 'sdt:::interrupt-start { @num[cpu] = count(); }'

Get Functions by Process by CPU:dtrace -n 'pid221:libc::entry'

Find what is Context Switching Much onto the CPU:dtrace -n 'sched:::on-cpu { @[execname] = count(); } profile:::tick-20s { exit(0); }'

Memory:

Page 14: Debugging Core Files Crash Dumps UNIX Linux

Tracking memory page faults by process name:dtrace -n 'vminfo:::as_fault { @mem[execname] = sum(arg0); }'

Process allocation (via malloc()) requested size distribution plot:dtrace -n 'pid$target::malloc:entry { @ = quantize(arg0); }' -p PID

Process allocation (via malloc()) by user stack trace and total requested size:dtrace -n 'pid$target::malloc:entry { @[ustack()] = sum(arg0); }' -p PID

File System:Trace file creat() calls with file and process name:dtrace -n 'syscall::creat*:entry { printf("%s %s", execname, copyinstr(arg0)); }'

Frequency count stat() files:dtrace -n 'syscall::stat*:entry { @[copyinstr(arg0)] = count(); }'

Tracing "cd":dtrace -n 'syscall::chdir:entry { printf("%s -> %s", cwd, copyinstr(arg0)); }'

Count read/write syscalls by syscall type:dtrace -n 'syscall::*read*:entry,syscall::*write*:entry { @[probefunc] = count(); }'

Syscall read(2) by file name:dtrace -n 'syscall::read:entry { @[fds[arg0].fi_pathname] = count(); }'

Syscall write(2) by file name:dtrace -n 'syscall::write:entry { @[fds[arg0].fi_pathname] = count(); }'

Syscall read(2) by filesystem type:dtrace -n 'syscall::read:entry { @[fds[arg0].fi_fs] = count(); }'

Syscall write(2) by filesystem type:dtrace -n 'syscall::write:entry { @[fds[arg0].fi_fs] = count(); }'

Syscall read(2) by process name for the "zfs" filesystem only:dtrace -n 'syscall::read:entry /fds[arg0].fi_fs == "zfs"/ { @[execname] = count(); }'

Syscall write(2) by process name and filesystem type:dtrace -n 'syscall::write:entry { @[execname, fds[arg0].fi_fs] = count(); } END { printa("%18s %16s %16@d\n", @); }'

Check Write Entries:dtrace -n 'syscall::write:entry { trace(arg2) }'dtrace -n 'fbt:ufs:ufs_write:entry { printf("%s\n",stringof(args[0]->v_path)); }'

Identify who's responsible for to Much Reading:dtrace -n 'syscall::read:entry { @Execs[execname] = count(); }'dtrace -n 'syscall::open:entry { @Open[copyinstr(arg0)] = count(); }'dtrace -n 'syscall::exec*:entry { trace(execname); }'

Drive into Complex Structures:dtrace -qn 'syscall::exec*:entry { printf("%5d %s\n",pid,stringof(curpsinfo->pr_psargs)); }'

Count All ioctl System Calls by Both Executable Name and File Descriptor:

Page 15: Debugging Core Files Crash Dumps UNIX Linux

dtrace -n 'syscall::ioctl:entry { @[execname, arg0] = count(); }'

Distribution of Write Size by Executable Name:dtrace -n 'syscall::write:entry { @[execname] = quantize(arg2); }'

Read bytes by process:dtrace -n 'sysinfo:::readch { @bytes[execname] = sum(arg0); }' Write bytes by process:dtrace -n 'sysinfo:::writech { @bytes[execname] = sum(arg0); }' Read size distribution by process:dtrace -n 'sysinfo:::readch { @dist[execname] = quantize(arg0); }' Write size distribution by process:dtrace -n 'sysinfo:::writech { @dist[execname] = quantize(arg0); }' Disk size by process:dtrace -n 'io:::start { printf("%d %s %d",pid,execname,args[0]->b_bcount); }'

Chase the Hot Lock Caller:dtrace -n 'pr_p_lock:entry { @s[stack()]=count() }'dtrace -n 'pr_p_lock:entry { @s[execname]=count() }'prep process_namedtrace -n 'pid4485:libc:pread:entry { @us[ustack()]=count() }'

Check UFS Read:dtrace -q -n 'ufs_read:entry { printf("UFS Read: %s\n",stringof(args[0]->v_path)); }'dtrace -q -n 'ufs_read:entry { @[execname,stringof(args[0]->v_path)]=count() }'

Show disk I/O size as distribution plots, by process name:dtrace -n 'io:::start { @size[execname] = quantize(args[0]->b_bcount); }'

Processes paging in from the filesystem:dtrace -n 'vminfo:::fspgin { @[execname] = sum(arg0); }'

Which processes are executing common I/O system calls:dtrace -n 'syscall::*read:entry,syscall::*write:entry { @rw[execname,probefunc] = count(); }'

What is the rate of disk I/O being issued:dtrace -n 'io:::start { @io = count(); } tick-1sec { printa("Disk I/Os per second: %@d\n", @io); trunc(@io); }'

NFSv3 count of operations by client address:dtrace -n 'nfsv3:::op-*-start { @[args[0]->ci_remote] = count(); }'

NFSv3 count of operations by file pathname:dtrace -n 'nfsv3:::op-*-start { @[args[1]->noi_curpath] = count(); }'

Socket Provider:Socket accepts by process name:dtrace -n 'syscall::accept*:entry { @[execname] = count(); }'

Socket connections by process and user stack trace:

Page 16: Debugging Core Files Crash Dumps UNIX Linux

dtrace -n 'syscall::connect*:entry { trace(execname); ustack(); }'

mib Provider:IP event statistics:dtrace -n 'mib:::ip* { @[probename] = sum(arg0); }'

TCP event statistics with kernel function:dtrace -n 'mib:::tcp* { @[strjoin(probefunc, strjoin("() -> ", probename))] = sum(arg0);}'

IP Provider:Received IP packets by host address:dtrace -n 'ip:::receive { @[args[2]->ip_saddr] = count(); }'

IP send payload size distribution by destination:dtrace -n 'ip:::send { @[args[2]->ip_daddr] = quantize(args[2]->ip_plength); }'

TCP Provider:Who is connecting to what:dtrace -n 'tcp:::accept-established { @[args[3]->tcps_raddr, args[3]->tcps_lport] = count(); }'

Who isn't connecting to what:dtrace -n 'tcp:::accept-refused { @[args[2]->ip_daddr, args[4]->tcp_sport] = count(); }'

What am I connecting to?dtrace -n 'tcp:::connect-established { @[args[3]->tcps_raddr , args[3]->tcps_rport] = count(); }'

IP payload bytes for TCP send, size distribution by destination address:dtrace -n 'tcp:::send { @[args[2]->ip_daddr] = quantize(args[2]->ip_plength); }'

MySQL:MySQL: query trace by query string:dtrace -n 'mysql*:::query-start { trace(copyinstr(arg0)) }'

MySQL: query count summary by host:dtrace -n 'mysql*:::query-start { @[copyinstr(arg4)] = count(); }'

MySQL server: trace queries:dtrace -qn 'pid$target::*mysql_parse*:entry { printf("%Y %s\n", walltimestamp, copyinstr(arg1)); }' -p PID

MySQL client: who's doing what (stack trace by query):dtrace -Zn 'pid$target:libmysql*:mysql_*query:entry { trace(copyinstr(arg1)); ustack(); }' -p PID

1.11 Other Debugging Tools on Solaris

gcore:Take a snapshot of a process:gcore –o output_filename pid

Page 17: Debugging Core Files Crash Dumps UNIX Linux

kill:Kill a process and generate its core dump:kill -SEGV <pid>

lsof:Get File Open by the Specified Process/Command:lsof -p 28290lsof -a -p 28290

Check How Many Instances of “sendmail” are Open:lsof -c sendmail

File Descriptors Number:ps -efcd /proc/28290/fdls -l | wc -l

Get File Open by the Specified User:lsof -u root

Get FileSystem iNodes:lsof -i /fs

Check Open Files on the specified File System and Processes the use it:lsof /fs

Check How Many Instances of “sendmail” are Open:lsof -c sendmail

Check iNodes Usage on the specified File System:lsof –i /fs

List All Open Files for the User “abe” and for the Specified Process IDs:lsof -p 456,123,789 -u 1234,abe

Find processes with open files on the NFS filesystem /nfs/mount/point whose server is inaccessible, and presuming your mount table supplies the device number for /nfs/mount/point:lsof -b /nfs/mount/pointSend a SIGHUP signal to All of the Processes that have “/u/abe/bar” Open:kill -HUP 'lsof -t /u/abe/bar'

Ignore the Device Cache File:lsof –Di

Get PID and command name field output for each process, file descriptor, file device number, and file inode number for each file of each process:lsof –FpcfDi

List the files at descriptors 1 and 3 of every process running the lsof command for login ID ''abe'' every 10 seconds:lsof -c lsof -a -d 1 -d 3 -u abe -r10

List All Files using Any Protocol on Any Port of mace.cc.org:lsof -i @mace

Page 18: Debugging Core Files Crash Dumps UNIX Linux

List All Files using Any Protocol on the Specified Port Range of mace.cc.org:lsof -i @mace:123-140

List All IPv4 Network Files in Use whose PID is 1234:lsof -i 4 -a -p 1234

fuser:Get Processes and related Username Running on the /var File System:fuser -uc /var

Get Process IDs and Login Names that have the /etc/passwd Files Open:fuser -u /etc/passwd

Reports on the File System and Files, restricting the output to Processes that hold Non-blocking Mandatory Locks:fuser -cn /export/foo

Kill Processes Running on the /var File System:fuser -ku /var

Send SIGTERM to Any Processes that hold a Non-blocking Mandatory Lock on the File /export/foo/my_file:fuser -fn -s term /export/foo/my_file

Get Processes Running on the / File System and Print the Processes Name and Arguments:ps -o pid,args -p "$(fuser / 2>/dev/null)"

Report Device Usage Informations:fuser –d /dev/dsk/c0t0d0

2. Debug Process and Analyze Process Core on Linux

2.1 Debug Processes by using STRACE

Trace the "ls" Command;strace ls

Trace the "open" System Call of the "ls" Command:strace -e open ls

Trace "open" and "read" System Calls of the "ls" Command:strace -e trace=open,read ls /home

Trace rsync and Log to File:strace -o /tmp/strace_ls_output.txt rsync

Trace a Process by PID and Log to File:strace -o /tmp/strace_rsync_21.06.txt -p pid

Trace "ls" Command and Print Relative Time for System Calls:

Page 19: Debugging Core Files Crash Dumps UNIX Linux

strace -r ls

Generate Statistics Report of System Calls for "ls" Command:strace -c ls /home

Trace All System Calls which have a filename as an argument:strace -o /tmp/strace_rsync_output.txt -e trace=file -p pid

Trace All Network Related System Calls:strace -o /tmp/strace_rsync_output.txt -e trace=network -p pid

Trace All File Descriptor Related System Calls:strace -o /tmp/strace_rsync_output.txt -e trace=desc -p pid

# -e verbose=all is the default verbosity.strace –tttT –o /tmp/s1.lst –p 2395strace -ttT -p 5164

2.2 Analyze a Process Core Dump with gdb on Linux

Start gdb on core file:gdb -c core

OR:gdb a.out core

OR:gdb path/to/the/binary path/to/the/core

OR by gdb Prompt:(gdb) core core

If the executable path is not provided, the debugger selects the invocation path of the process that generated the core file. The invocation path information is stored in the core file. If the invocation path is a relative path, you must enter the executable while debugging the core file.

To start debugging the process, at the gdb Prompt, invoke the core file:(gdb) core core

Check Status:(gdb) status

View Data:(gdb) data

View Stacks:(gdb) stack

Analyze a Stack by its Number:(gdb) frame number

Page 20: Debugging Core Files Crash Dumps UNIX Linux

View Code around that Stack:(gdb) list

List Variables:(gdb) info locals

View Files:(gdb) files

View Internals:(gdb) internals

View Command Aliases:(gdb) aliases

Check Support Facilities:(gdb) support

Running Program:(gdb) running

Analyze Tracing of Program Execution without Stopping it:(gdb) tracepoints

User-defined Commands:(gdb) user-defined

Get Obscure Features:(gdb) obscure

View the stack trace:(gdb) backtrace

List all threads of the process at the time of the crash:(gdb) info thread

View the specified thread:(gdb) thread thread_id

Disassembly a specified section in the memory:(gdb) disassemble <address>

Displays memory information of a specified address:(gdb) x / s <address>

Display the contents the registers:(gdb) info registers

Display all registers, including floating point registers:(gdb) i all

Display informations about all of the shared libraries:(gdb) info shared

Prints the target that is currently under the debugger:

Page 21: Debugging Core Files Crash Dumps UNIX Linux

(gdb) info files(gdb) info target

To view the source code:(gdb) list

Start the target program:(gdb) run

Set a breakpoint:(gdb) break sum

On a line number:(gdb) b 25

On a n offset on the current line:(gdb) b +9(gdb) b -1

On a memory address (use *):(gdb) b *00x2324

Set a watchpoint on a variable or expression:(gdb) watch x(gdb) watch_target &x(gdb) watch_target (<type of x> *) *<addr of x>

Display a list of breakpoints and watchpoints:(gdb) info break(gdb) info watch

Help:(gdb) help

2.3 Analyze the Process Core using dbx on Linux

To invoke dbx:dbx a.out core

To invoke dbx:dbx program_name

OR:dbx pid

OR:dbx –a pid

OR:dbx -d 100 program_name core_file

OR:dbx -d 100 -a pid

Page 22: Debugging Core Files Crash Dumps UNIX Linux

At dbx Prompt:(dbx) run(dbx) where(dbx) threads(dbx) status(dbx) list main (dbx) print msg(dbx) check -access(dbx) check -memuse(dbx) help(dbx) quit

2.4 Analyze a Core Dump Using Oprofile on Linux

OProfile is a Linux system-wide Profiling Tool to Profile and Analyze Performance and Runtime Problems with Applications, or the Kernel.

Gunzip the Kernel:cd /bootgunzip vmlinux-<something>.gz

Run OProfile without Profiling the Kernel:opcontrol --no-vmlinux

If you do want to Profile the Kernel:opcontrol --vmlinux=/boot/vmlinux-`uname -r`

Start Collecting Data:opcontrol --start

Dump the Collected Data:opcontrol --dump

Stop Oprofile:opcontrol --stop

If you want to Reset Profiling Counters:opcontrol --reset

Report Collected Data:opreport

To Collect More Info:opcontrol --symbols

OR:opcontrol -l

To Create a Graph:opcontrol -c

Page 23: Debugging Core Files Crash Dumps UNIX Linux

2.5 Debug Libraries and Symbols on Linux

Trace Calls to the Library Function for the "ls" Command:ltrace /usr/bin/who

Trace Calls to the Library Function for the "ls" Command and Log to File:ltrace -o ls.tr ls

Trace All System Calls to the Library Function and Log to File:ltrace -S -o ls.tr ls

Check Linked Libraries:ldd filename

Check Module Info:modinfo module_name.ko

The Names of the Files Containing the Object Code and Symbols for Libraries are in the ELF File.

To Read ELF File:readelf program_of_interest | less

Disassembly a Program:objdump -D -S <compiled_object_with_debug_symbols> > filename.outobjdump -d -S module_name.ko > /tmp/whatever

List Symbols:nm /usr/bin/who

2.6 Other Debugging Tools on Linux

gcore:Take a snapshot of a process:gcore –o output_filename pid

kill:Kill a process and generate its core dump:kill -SEGV <pid>

lsof:Get File Open by the Specified Process/Command:lsof -p 28290lsof -a -p 28290

Check How Many Instances of “sendmail” are Open:lsof -c sendmail

File Descriptors Number:ps -ef

Page 24: Debugging Core Files Crash Dumps UNIX Linux

cd /proc/28290/fdls -l | wc -l

Get File Open by the Specified User:lsof -u root

Get FileSystem iNodes:lsof -i /fs

Check Open Files on the specified File System and Processes the use it:lsof /fs

Check How Many Instances of “sendmail” are Open:lsof -c sendmail

Check iNodes Usage on the specified File System:lsof –i /fs

List All Open Files for the User “abe” and for the Specified Process IDs:lsof -p 456,123,789 -u 1234,abe

Find processes with open files on the NFS filesystem /nfs/mount/point whose server is inaccessible, and presuming your mount table supplies the device number for /nfs/mount/point:lsof -b /nfs/mount/pointSend a SIGHUP signal to All of the Processes that have “/u/abe/bar” Open:kill -HUP 'lsof -t /u/abe/bar'

Ignore the Device Cache File:lsof –Di

Get PID and command name field output for each process, file descriptor, file device number, and file inode number for each file of each process:lsof –FpcfDi

List the files at descriptors 1 and 3 of every process running the lsof command for login ID ''abe'' every 10 seconds:lsof -c lsof -a -d 1 -d 3 -u abe -r10

List All Files using Any Protocol on Any Port of mace.cc.org:lsof -i @mace

List All Files using Any Protocol on the Specified Port Range of mace.cc.org:lsof -i @mace:123-140

List All IPv4 Network Files in Use whose PID is 1234:lsof -i 4 -a -p 1234

File Open by a Process:ps -efcd /proc/28290/fdls -lrt

Process Info:cd /proc/28290

Page 25: Debugging Core Files Crash Dumps UNIX Linux

ls -l

more statusmore limitsmore iomore mountsmore mountstat

fuser:Get Process IDs and Login Names that have the /etc/passwd Files Open:fuser -u /etc/passwd

Get Verbone Info Including Process IDs and Login Names that have the /etc/passwd Files Open:fuser -vu /etc/passwd

Kill Processes Accessing the /var File System in Any Way:fuser -km /var

Get Processes Running on the / File System and Print the Processes Name and Arguments:ps -o pid,args -p "$(fuser / 2>/dev/null)"

If there’s No Process on the specified Device, then Execute the xxx Command:if fuser -s /dev/ttyS1; then :; else something; fi

Show All Processes at the Local Telnet Port:fuser telnet/tcp

3. Debug Process and Analyze Process Core on HP-UX

3.1 Debug Processes by using tusc

Trace a Process System Calls:tusc pid

Trace a Process System Calls and Count it:tusc -c pidtusc -cc pidtusc –ccc pid

Trace a Process System Calls and Count it adding more Informations:tusc -C pidtusc -cC pid

Trace a “init” System Calls, Count it adding more Informations and Print Process Names:tusc -cCn 1

Trace Verbosely a Process System Calls and Follow Forks:

Page 26: Debugging Core Files Crash Dumps UNIX Linux

tusc -vf pid

Trace a Process System Calls, Follow Forks and Print Process Names:tusc -fn pid

Trace Verbosely a Process System Calls, Follow Forks and Print Process ID:tusc -vfp pid

Trace a “bdf /” System Calls and its Forks, Count it adding more Informations and Print Process Names:tusc -fcCn “bdf /”

Trace Verbosely a “bdf /” System Calls and its Forks, and Print Process Names and Timestamp for Each Syscall and Signal:tusc -vfnT “bdf /”

Trace a “bdf /” System Calls and its Forks, Count it adding more Informations, Print Process Names and Execution Time:tusc -fcCnD “bdf /”

Trace a “bdf /” System Calls and its Forks, and Print Process Names, Duration Time and Timestamp for Each Syscall and Signal:tusc -fnDT “bdf /”

Trace Verbosely a “bdf /” System Calls and its Forks, and Print Process Names, Duration Time and Timestamp for Each Syscall and Signal:tusc -vfnDT “bdf /”

Trace a “bdf /” System Calls and its Forks, Count it adding more Informations, Print Process Names, Execution Time and Timestamp for Each Syscall and Signal:tusc -fcCnDT “bdf /”

Trace a Process System Calls, Follow Forks and Keep Tracing Parent even if Parent Exits:tusc -fk pid

Trace a Process System Calls, Printing Process Names and Timestamp for Each Syscall and Signal, and Detach Process if it Enters Traced Mode:tusc -tnT 455

Trace a Process System Calls Concentrating on exec() Functions:tusc -sexec pid 2> /dev/null &

Trace a Process System Calls and File Descriptors and Log to File (lsnrctl.truss):tusc -d -o lsnrctl.truss -p 3949

Trace a Process System Calls and the Specified File Descriptors and Log to File (lsnrctl.truss):tusc -dFileDescriptors -o lsnrctl.truss -p 3949

Trace Verbosely All of the System Calls of the “ps -ef” command:tusc –v -o /var/tmp/syslog.truss.out -sall -p “ps -ef”

Trace a Process System Calls, and Print Read Buffers for All of the File Descriptors:tusc -rall <PID>

Page 27: Debugging Core Files Crash Dumps UNIX Linux

Trace a Process System Calls its Forks, and Print the Read Buffers for the Specified File Descriptors:/usr/local/bin/tusc -f -r 3,4,5,6 -o /tmp/trace_results /usr/local/sbin/snmpd

Trace a Process System Calls and its Forks, and Print Read and Write Buffers for given File Descriptors, but don’t show Sleeping Syscalls:tusc -rall -wall -f <PID>

Trace a Process System Calls and its Forks, and Print Read and Write Buffers for given File Descriptors:tusc -rall -wall -f -i <PID>

Trace a Process System Calls and its Forks, and Print Execution Time:tusc –f -D /usr/local/sbin/snmpd

Trace a Process System Calls and its Forks, and Print exec Arguments and Execution Time:tusc –f -a -D /usr/local/sbin/snmpd

Trace a Process System Calls and Execution Time and Count it:tusc -c sqlplus "/ as sysdba" << EOFexit;EOF

Trace a Process System Calls and Execution Time:tusc -d sqlplus "/ as sysdba" << EOFexit;EOF

Trace Specific System Calls:tusc –s syscall_name 455

Trace Specific Signals:tusc –S syscall_name 455

Execute syslog-ng, follow children, print timestamps and Send output to /tmp/tusc.out:tusc -faepo /tmp/tusc.out -v -T %H:%M:%S /opt/syslog-ng/sbin/syslog-ng

Execute sqlplus, follow children, print timestamps and Send output to /tmp/tusc.out:tusc -faepo /tmp/tusc.out -v -T %H:%M:%S sqlplus scott/tiger

Execute sqlplus, follow children and Send output to /tmp/tusc.out:tusc -faepo /tmp/tusc.out -v sqlplus scott/tiger

Attach to a running process and Send output to /tmp/tusc.out:tusc -faepo /tmp/tusc.out -v -T %H:%M:%S -p <pid>

tusc -faepo /tmp/tusc.out -v -p <pid>tusc -faepo /tmp/tusc.out -p <pid>

Unless advised otherwise, the minimum options used should be:tusc -faepo <output file> ....

Page 28: Debugging Core Files Crash Dumps UNIX Linux

Trace Verbosely System Calls and its Forks, Print Environment Variables, Process Names, PIDs, Timestamps and Duration Time:tusc -e -n -p -T '%T' -D -f –v pid

Run truss on Log Files to Detect System Problems:tail -f /var/adm/SYSLOGtail -f /var/adm/messagestail -f /var/log/syslog/usr/local/bin/sstep ls

Find the PIDs of the Processes to Trace:function get_pid { (echo foo 0 ${1};ps -ef)| grep ${1} | grep -v "grep *${1}" | tail -1| awk '{if ($2 > 0) {print $2} else {print ""}}'}/opt/tusc/bin/tusc -o /tmp/tusc.log -v -r all -w all -p -T "%d.%m.%Y %H:%M:%S" `get_pid WorkManager`

OR with Multiple “get_pid”:/opt/tusc/bin/tusc -o /tmp/tusc.log -v -r all -w all -p -T "%d.%m.%Y %H:%M:%S" `get_pid WorkManager` `get_pid SolidDesigner` `get_pid MEls`

3.2 Instaling tusc on HP-UX 11.xx

Download the tusc package for your HP-UX version and architecture at the following address:http://hpux.connect.org.uk/hppd/cgi-bin/search?term=tusc&Search=Search

Create a temporary directory and upload the depot onto it:mkdir /tmp/tempo_inst

Access the temporary directory and gunzip the depotcd /tmp/tempo_instgunzip tusc-x.x-xxxx-11.xx.depot.gzls -l

Install the depot package by using one the following methods:

a) By using swinstall (recommended):swinstall -s tusc-x.x-xxxx-11.xx.depot

OR

b) By manually extract the tarball and copying the content to the appropriate directories:tar –xf tusc-x.x-xxxx-11.xx.depot

Access the bin subdir of the depot directory and copy its content to the /bin directory:cd tusc/tusc-RUN/usr/local/bin/cp * /bin/

Page 29: Debugging Core Files Crash Dumps UNIX Linux

Access the man subdir of the depot directory and copy its content to the /usr/local/man/man1 directory:cd ../man/cp man1/tusc.1 /usr/local/man/man1/

3.3 Debug Processes and Core Files by using HP WDB / GDB

The HP Wildebeest Debugger (WDB) is an HP-supported implementation of the Open Source GNU debugger (GDB).

HP WDB / GDB can be used to debug / monitor a process, but it mostly used to analyze crashed processes core files and system’s crash dumps.

Check if HP WDB is installed:swlist -l fileset | grep -i wdb

If HP WDB is not installed, you can download the latest version (6.3) for your HP-UX version and architecture from here: you need an HP AllianceONE account with appropriate provileges.

Upload the depot file onto the server’s /tmp directory, access the directory and decompress it:cd /tmp gunzip hpwdb.xxxx.xxxx.depot.gz

Install the depot:swinstall –s hpwdb.xxxx.xxxx.depot/*

The main path are:/opt/langtools/wdb/opt/langtools/gdb/opt/langtools/bin

To monitor/debug a process:gdb –crashdebug pid

Before analyzing a process core file, check it:file corefile_namestrings corefile_name

Check if it’s truncated:elfdump -o -S core

To analyze a process core file generated by the snmpd daemon:gdb /usr/bin/snmpd core

OR:gdb /usr/bin/snmpd -c core

OR:gdb -c core

Page 30: Debugging Core Files Crash Dumps UNIX Linux

OR:gdb /usr/bin/snmpd

OR to start the HP WDB GUI:wdb /usr/bin/snmpd

OR to start gdb with XDB support:gdb –xdb /usr/bin/snmpd

OR to start gdb with XDB support using Terminal User Interface:gdb –xdb -tui /usr/bin/snmpd

OR:gdb

At gdb Prompt:(gdb) core core

If the executable path is not provided, the debugger selects the invocation path of the process that generated the core file. The invocation path information is stored in the core file. If the invocation path is a relative path, you must enter the executable while debugging the core file.

To start debugging the process, at the gdb Prompt, invoke the core file:(gdb) core core

View the stack trace:(gdb) backtrace

List all threads of the process at the time of the crash:(gdb) info thread

View the specified thread:(gdb) thread thread_id

Disassembly a specified section in the memory:(gdb) disassemble <address>

Displays memory information of a specified address:(gdb) x / s <address>

Display the contents the registers:(gdb) info registers

Display all registers, including floating point registers:(gdb) i all

Display informations about all of the shared libraries:(gdb) info shared

Prints the target that is currently under the debugger:(gdb) info files(gdb) info target

To view the source code:

Page 31: Debugging Core Files Crash Dumps UNIX Linux

(gdb) list

Start the target program:(gdb) run

Set a breakpoint:(gdb) break sum

On a line number:(gdb) b 25

On a n offset on the current line:(gdb) b +9(gdb) b -1

On a memory address (use *):(gdb) b *00x2324

Set a watchpoint on a variable or expression:(gdb) watch x(gdb) watch_target &x(gdb) watch_target (<type of x> *) *<addr of x>

Display a list of breakpoints and watchpoints:(gdb) info break(gdb) info watch

Force a core dump and create a core image file for the process under the debugger:(gdb) dumpcore core_filename

Pack the core file along with the relevant executable and libraries in a single tar file for core file debugging on another system:(gdb) packcore

Unpack the tar file that is generated by the “packcore” command so the debugger can use the executable and shared libraries from this bundle, when debugging the core file on a different system from the one on which the core file was originally created:(gdb) unpackcore

To Debug Memory with gdbSet heap checking options:(gdb) set heap-check [option][on/off]

Detection leaks:(gdb) set heap-check leaks [on/off]

Detect double-frees and free improper arguments:(gdb) set heap-check free [on/off]

Check for out-of-bounds corruption:(gdb) set heap-check bounds [on/off]

Set the number of frames to be printed for leak and heap profiles:(gdb) set heap-check frame-count [num]

Page 32: Debugging Core Files Crash Dumps UNIX Linux

Produce a heap allocations report:(gdb) info heap [heap.out]

Produce a memory leak report:(gdb) info leaks [leaks.out]

Lists the potential in-block corruptions in all the freed blocks:(gdb) info corruption

Search for a Pattern in the Memory Address Space(gdb) find &str[0], &str[15], "string_to_search"

(gdb) find &a[0], &a[10], "el",'l'where&a[0] Specifies the start address of the memory address range.&a[10] Specifies the end address of the memory address range.“el”, 'l' Specifies the pattern.

(gdb) find /1 &int8_search_buf[0], +sizeof(int8_search_buf), 'a', 'a', 'a'where/1 Specifies the find command to display only one matching pattern.&int8_search_buf[0] Specifies the starting address.+sizeof(int8_search_buf) Specifies the ending address.'a', 'a', 'a' Specifies the pattern (expr1, expr2, expr3).

(gdb) find /b &int8_search_buf[0], &int8_search_buf[0]+sizeof(int8_search_buf),0x61, 0x61, 0x61, 0x61where/b Specifies that the size of the pattern is 8 bits.&int8_search_buf[0] Specifies the starting address.&int8_search_buf[0]+sizeof(int8_search_buf)Specifies the ending address.0x61, 0x61, 0x61, 0x61 Specifies the pattern (expr1, expr2, expr3, exp4).

Avoid Core File CorruptionTo prevent overwriting of core files from different processes, set the kernel parametercore_addpid to 1.The core file is stored in a file name, <core.pid> in the current directory.To set the kernel parameter to prevent core file corruption, create a script called “corepid”:On HP-UX 11i v1 systemscase $1 inon) echo "core_addpid/W 1\ncore_addpid?W 1" | adb -w -k /stand/vmunix /dev/kmem;;off) echo "core_addpid/W 0\ncore_addpid?W 0" | adb -w -k /stand/vmunix /dev/kmem;;stat) echo "core_addpid/D\ncore_addpid?D" | adb -w -k /stand/vmunix /dev/kmem;;*) echo "usage $0: on|off|stat";;esac

On HP-UX 11i v2 systemscase $1 in

Page 33: Debugging Core Files Crash Dumps UNIX Linux

on) echo "core_addpid/W 1\ncore_addpid?W 1" | adb -o -w /stand/vmunix /dev/kmem;;off) echo "core_addpid/W 0\ncore_addpid?W 0" | adb -o -w /stand/vmunix /dev/kmem;;stat) echo "core_addpid/D\ncore_addpid?D" | adb -o -w /stand/vmunix /dev/kmem;;*) echo "usage $0: on|off|stat";;esac

Then, get the current settings:. /corepid stat

To enable the feature to store the core file in the file “core.pid” (set core_addpid to 1), run the script:. /corepid on

Get again the current settings to check the change:. /corepid stat

If you want to disable the feature to store the core file in the file “core.pid” (set core_addpid to 0), run the script:. /corepid off. /corepid stat

On HP-UX 11i v3 systems, use “coredm”, that allows to specify the location and pattern for core files created by abnormally terminating processes: it also allows to specify the process specific pattern for the file name of the core file.To set the global core file settings to include the process-ID and the system name in the file name of the core and to place the core file in the specified path, <path>, run:coreadm -e global -g <path>/core.%p.%n

Java Core File DebuggingHP WDB shows stack traces of mixed Java, C, and C++ programs for java corefile.The GDB_JAVA_UNWINDLIB environment variable must be set to the path name of the Java unwind library.If the Java and system libraries used by the failed application reside in non-standardlocations, then the GDB_SHLIB_PATH environment variable must be set to specify thelocation of the libraries.

Invoke gdb on a core file generated when running a 32-bit Java application on anIntegrity system with /opt/java1.4/bin/java:gdb /opt/java1.4/bin/IA64N/java core.java

Invoke gdb on a core file generated when running a 64-bit Java application on anIntegrity system with /opt/java1.4/bin/java -d64:gdb /opt/java1.4/bin/IA64W/java core.java

Invoke gdb on a core file generated when running a 32-bit Java application onPA-RISC using /opt/java1.4/bin/java:gdb /opt/java1.4/bin/PA_RISC2.0/java core.java

Invoke gdb on a core file generated when running a 64-bit Java application onPA-RISC using /opt/java1.4/bin/java:gdb /opt/java1.4/bin/PA_RISC2.0W/java core.java

Page 34: Debugging Core Files Crash Dumps UNIX Linux

3.4 Debug Processes by using truss

Trace a Process System Calls:truss -p pid

Trace a Process System Calls and Count it:truss -c -p pid

Trace a Process System Calls and Count it adding more Informations:truss -C -p pid

Trace a Process System Calls and Follow Forks:truss -f -p pid

Trace a Process System Calls Concentrating on exec() Functions:truss -sexec -p pid 2> /dev/null &

Trace a Process System Calls and File Descriptors and Log to File (lsnrctl.truss):truss -d -o lsnrctl.truss -p 3949

Trace All of the System Calls of the “ps -ef” command:truss -o /var/tmp/syslog.truss.out -sall -p “ps -ef”

Trace a Process System Calls, and Print Read Buffers for given File Descriptors:truss -rall -p <PID>

Trace a Process System Calls and its Forks, and Print Read and Write Buffers for given File Descriptors:truss -rall -wall -f -p <PID>

Trace a Process System Calls and its Forks, and Print Execution Time:truss –f -D /usr/local/sbin/snmpd

Trace a Process System Calls and its Forks, and Print exec Arguments and Execution Time:truss –f -a -D /usr/local/sbin/snmpd

Trace a Process System Calls and Execution Time and Count it:truss -c sqlplus "/ as sysdba" << EOFexit;EOF

Trace a Process System Calls and Execution Time:truss -d sqlplus "/ as sysdba" << EOFexit;EOF

Verbosely Trace init:truss -p -v all 1

Page 35: Debugging Core Files Crash Dumps UNIX Linux

Run truss on a Command:truss -d date

Run truss to Debug Application Start:truss -rall -wall lsnrctl starttruss -aef lsnrctl dbsnmp_start

nohup /opt/tusc/bin/truss -o /tmp/syslog-ng.truss -aef /usr/local/sbin/syslog-ng --debug --foreground --stderr > syslog-ng.out 2>&1 &

grep syslog-ng.conf /tmp/syslog-ng.truss

3.5 Anlalyze Process Performance by using Caliper on HP-UX 11.xx

HP Caliper is a general-purpose performance analysis tool for applications on HP-UX and Linux systems running on HP Integrity Servers.

If it is not installed, you can download the current Caliper version 5.5: you need an AllianceONE account with appropriate privileges.

Upload the depot file on the server, gunzip it and install it:gunzip caliper.xx.xxxx.depot.gzswinstall –s caliper.xx.xxxx.depot

You can use with Caliper an initialization file (called .caliperinit), so it automatically uses this file at startup for data collection or data reporting runs. Putting the options in an initialization file simplifies the command line you use. This file is not required, but can be useful.If in the .caliperinit file the “--read-init-file” option is set to “True”, then Caliper will be used.You can find a sample initialization file in the caliper home, under the examples/startup_file/caliperinit directory: rename it to .caliperinit.Here is an example of the content:********************************************************************#Options applied to all report types.application ='myapp'arguments = '-myarg 2'context_lines = 0,3summary_cutoff = 1detail_cutoff =5source_path_map = '/proj/src,/net/dogbert/proj/src:/home/wilson/work'#Report-specific options.if caliper_config_file == 'branch':sort_by = 'taken'elif caliper_config_file == 'fprof':sort_by = 'fcount'report_details = 'statement'context_lines = 'all'# Apply an option to a subset of reports.if caliper_config_file in ("fcount"):

Page 36: Debugging Core Files Crash Dumps UNIX Linux

module_exclude = '/usr/lib/'********************************************************************

caliper uses particular measurement configuration files you can edit or create according to your needs you can find it in /opt/caliper:cd /opt/caliperls –lrt

The measurement configuration files provided with HP Caliper and the main performance measurements they take are the following:

alat measurement measures and reports sampled advance load address table (ALAT) misses

branch measurement cgprof measurement measures and reports a call graph profile, produced by

instrumenting the application code cpu measurement and per-process metrics cstack measurement cycles measurement dcache data cache measurement ecount total CPU event counts measurement fcount function call counts measurement fprof function profile measurement icache instruction cache metrics measurement scgprof measurement measures and reports (an inexact) call graph profile,

produced by sampling the PMU to determine function calls traps measurement collects and reports a profile of traps, interrupts, and faults.

fprof (flat profile) shows the parts of the process that have the highest CPU usage:caliper fprof ./binary_name

Show the parts of the process that have the highest CPU usage reporting both source and instructions (-r all) and logging the output to file:caliper fprof -o out.txt -r all

Run the default measurement, scgprof:caliper ktrace

Run functions call count measurement:caliper fcount ktrace

CPU measurement for the specified application or process:caliper cpu my_new_app

System-wide CPU measurement (log output to file):caliper cpu -w -e 120 -o cpu.txt

Measure CPU and Memory for the specified process and report:caliper cpu -o REPORT --memory-usage=all my_app

Measure CPU and system usage for the specified process and report:caliper cpu -o REPORT --system-usage=all my_app

Create a call graph profile with HP Caliper:caliper scgprof [caliper_options] program [program_arguments]

Page 37: Debugging Core Files Crash Dumps UNIX Linux

Create a report:caliper report [options]

The overview measurement enables collecting fprof, dcache, and cstack data in one single collection run:caliper overview -o rpt my_app

Collect system-wide fprof and dcache data for a duration of 300 seconds:caliper overview -w -e 300 -o rpt

Override the sampling_spec setting in pmu_trace:caliper pmu_trace -s period,variation,cpu_event program

Override the events to be measured in ecount on HP-UX:caliper ecount -m cpu_event,cpu_event program

Override the kernel stop functions and get all frames in the cstack on HP-UX:caliper cstack --stop_functions = "" program

Create a call stack profile report in the file named results.save when profiling the program enh_thr_mutex1:/opt/caliper/bin/caliper cstack -o results.save enh_thr_mutex1

To stop Caliper:kill -s INT caliper_process_ID

3.6 Analyze Process Performance by using Prospect on HP-UX 11.xx

Prospect is a performance analysis tool. On HP-UX, Prospect uses the Kernel Instrumentation (KI) tracing and Kernel Timing Clocks (KTC) package. Prospect collects data from the kernel that is only a "window of time".Prospect is available on HP-UX (PA-RISC 64-bit kernel).

If it is not installed, you can download the current Prospect version 2.6.1: you need an AllianceONE account with appropriate privileges.

Upload the depot file on the server, gunzip it and install it:gunzip prospect.xx.xxxx.depot.gzswinstall –s prospect.xx.xxxx.depot\*

You can use Prospect to profile Java applications on HP-UX.Prospect has additional features in profiling Java applications when running an HP JVM on HP-UX: to activate these features you must install HP Hotspot JVM version 1.3.1.02 or later.

Obtain symbols of JVM compiled methods in Prospect's output:prospect -V3 -foutput java -XX:+Prospect Qsort 1000000

Profile a JVM process with the specified PID:prospect -j1495 -V4 -foutput2 sleep 20

Profile the specified process or application:

Page 38: Debugging Core Files Crash Dumps UNIX Linux

prospect my_app

Verbosely profile the specified process or application:prospect -v my_proc

You can use Prospect as a statistical profiler to extract function or assembly level profiles and exact system call timings for processes of interest.

In order to use Prospect in this mode, you first need to activate KI and keep it active. This is done via the daemon mode of Prospect:prospect -P

This mode does not consume any processor resources, it is used only as a way to keep the KI trace active.

Prospect collects data over an interstice in time.Use KI and distill the output for the immediate child of Prospect (in this case, my_app), and output the summary, memory maps, profiles, and system call tables into a file called "output":prospect -V 2 -f output my_app

Outputs only information of the direct descendant child:prospect -V2 -f output1 my_app

Record all traces sampled in the time my_app ran into a binary file called "Tfile1":prospect -T Tfile1 my_app

Read the trace out of a file:prospect -t Tfile1 -f output42

Sample the kernel for 120 seconds and output the results to a file called "output”:prospect -V k -f output sleep 120

See how the kernel is performing while a specific application is running and also how that application is performing, put kernel profile in file "kern_output" and your user process profile (my_app) in file "proc_output":prospect -TTfile my_appprospect -tTfile -Vk -fkern_outputprospect -tTfile -V2 -fproc_output

Start the program to be profiled under prospect –hprof (hierarchical profile), generate a user time profile of gzip run and save the output to file:prospect --hprof --output-file=hprof.out gzip firebolt.tar

Start the program to be profiled under prospect –hprof (hierarchical profile), generate a user time profile of gzip run and save the output to file with a sampling interval of the run is 100ms:prospect --hprof --sampling-interval=100 gzip firebolt.tar

Generate a HP Caliper-like fprof reports:prospect --fprof --output-file fp.out ./qsort32

Attach a running process specified by process ID:prospect --fprof --output-file fp.out --attach=1234

Page 39: Debugging Core Files Crash Dumps UNIX Linux

Create a binary trace file:prospect --fprof --datafile=fp.cdf ./qsort32

Generate fprof report from the binary trace file:prospect --report --datafile=fp.cdf -o fp.out

Profile for a particular duration of time:prospect --fprof -o fp.out --duration=5 ./loop 10000000

Specify function summary cutoffs:prospect --fprof --summary-cutoff=,80 ./wordplay

Specify function deltails cutoff:prospect --fprof ./qsort32 (collect mode)prospect --report --detail-cutoff=,80 ./qsort32 (report mode)

Generate single report for a multithreaded application with the results of all threads aggregated together:prospect --fprof --thread=sum-all ./threadsthread

Report per-thread data for a multithreaded application:prospect --fprof --thread=all -o fp.out ./threadsthread

Report per-module data for a multithreaded application:prospect --fprof --per-module-data=TRUE --thread=all ./threadsthread

Exclude load modules:prospect --fprof --thread=all --module-exclude=/usr/lib ./threadsthread

Include load modules:prospect --fprof --thread=all --module-default=none --module-include=threadsthread ./threadsthread

Collect profile data till the processes specified terminate:prospect -V6 pid1,pid2,pid3 -f log

Collect profile data for a specified duration of time:prospect -V6 pid1,pid2,pid3 -f log sleep <duration>

Get a raw ASCII file of KI trace records:prospect -T BinTraceFile sleep 30prospect -t BinTraceFile -F AsciiTraceFile

Prospect KI kernel buffer freeing:kill <prospect -P daemon>prospect -a

Prospect KI buffer sizing:kill <prospect -P daemon>prospect -aprospect -A 4194304prospect -P

To find out how much lockable memory your system has:dmesg | grep lockable

Page 40: Debugging Core Files Crash Dumps UNIX Linux

3.7 Live Memory Analysis on HP-UX 11.xx by using KWDB

KWDB can analyze a live system to find memory leaks, performance issues and more.

Find the pathname of the currently running vmunix:kmpath

KWDB on PA requires the kernel file to be preprocessed by pxdb (change the kernel filename if it is not the standard /stand/vmunix):pxdb /stand/vmunix

Start KWDB with Q4 support to debug the kernel file, and set up the devmem target to read from /dev/mem and /dev/kmem:kwdb -q4 /stand/vmunix /dev/kmem

OR:kwdb /stand/vmunix(kwdb) target devmem(kwdb) set kwdb q4 on

OR you can also run:q4 /stand/vmunix /dev/kmem

At the q4 Promptq4> load struct utsname from &utsnameq4> print –tx

Get a listing of all the structures and typedefs that contain the string of characters “callout”:q4> cat callout

Get a listing of all the fields defined in a callout structure:q4> fields -cx struct callout

Load all the callout structures from the callout table:q4> load struct callout from callout max ncallout

List all the different flag fields in these structures:q4> print c_flag | sort –u

Keep only those callout structures with the PENDING_CALLOUT flag set:q4> keep c_flag & PENDING_CALLOUT

List all the different function addresses pointed to by these structures:q4> print -x var.real_callout.cc_func | sort -u

Get name of kernel routines found in the previous step:q4> examine 0x191a08 using aq4> ex 0x19e3e8 using aq4> ex 0x8c230 using a

Page 41: Debugging Core Files Crash Dumps UNIX Linux

Display the instructions of the functions:q4> conde unselect

Look into the near term, mid term and far future events. Load the near term callout headers and list different types from the flag fields:q4> load struct callout from callout_time_nr max ncallout until callout_time_mdq4> print c_flag | sort –u

List the absolute time fields for the headers:q4> print indexof c_abs_time_hi c_abs_time_lo

Load the mid term callout headers and print the absolute time fields for the headers:q4> load struct callout from callout_time_md max 256

Load the callout header for far future events, (there is only single header for all farfuture events) and display contents:q4> load struct callout from callout_time_ffq4> print –tx

Load the linked list of structures associated with this and print types and absolute times for each of them:q4> load struct callout from c_time_next max ncallout next c_time_nextq4> print c_abs_time_hi c_abs_time_lo c_flag

Load the hash headers and display flags, times and links:q4> load struct callout from callout_hash max 256q4> print -x flag c_abs_time_lo c_time_next c_hash_next

Load two of the expired headers and display all the fields:q4> load struct callout from callout_hash skip 256 max 2q4> print –tx

3.8 Other Debugging Tools on HP-UX 11.xx

gcore:Take a snapshot of a process:gcore –o output_filename pid

kill:Kill a process and generate its core dump:kill -SEGV <pid>

lsof:Check Open Files on the specified File System and Processes the use it:lsof /fs

Check How Many Instances of “sendmail” are Open:lsof -c sendmail

Check iNodes Usage on the specified File System:lsof –i /fs

Page 42: Debugging Core Files Crash Dumps UNIX Linux

Check Files Opened by the Specified User:lsof –u user_name

List All Open Files for the User “abe” and for the Specified Process IDs:lsof -p 456,123,789 -u 1234,abe

Find processes with open files on the NFS filesystem /nfs/mount/point whose server is inaccessible, and presuming your mount table supplies the device number for /nfs/mount/point:lsof -b /nfs/mount/pointSend a SIGHUP signal to All of the Processes that have “/u/abe/bar” Open:kill -HUP 'lsof -t /u/abe/bar'

Ignore the Device Cache File:lsof –Di

Get PID and command name field output for each process, file descriptor, file device number, and file inode number for each file of each process:lsof –FpcfDi

List the files at descriptors 1 and 3 of every process running the lsof command for login ID ''abe'' every 10 seconds:lsof -c lsof -a -d 1 -d 3 -u abe -r10

List All Files using Any Protocol on Any Port of mace.cc.org:lsof -i @mace

List All Files using Any Protocol on the Specified Port Range of mace.cc.org:lsof -i @mace:123-140

List All IPv4 Network Files in Use whose PID is 1234:lsof -i 4 -a -p 1234

fuser:Get Processes and related Username Running on the /var File System:fuser -uc /var

Get Process IDs and Login Names that have the /etc/passwd Files Open:fuser -u /etc/passwd

Get Processes Running on the Specified Device:fuser -xc /dev/hd3

Kill Processes Running on the /var File System:fuser -ku /var

Get Processes Running on the / File System and Print the Processes Name and Arguments:ps -o pid,args -p "$(fuser / 2>/dev/null)"

A Debugging Example:type midaemonfile `which midaemon`what `which midaemon`ldd `which midaemon`grep -i midaemon /etc/*

Page 43: Debugging Core Files Crash Dumps UNIX Linux

grep -i midaemon /etc/init.d/*swlist -l file | grep midaemonlsof -c midaemonps -elf | sed -n '1p; /midaem[.]*on/p;'lsof | sed -n '1p; / 17949 /p'lsof | sed -n '1p; / 17923 /p'tusc 2198strings `which midaemon` | head -n 7tail -n 30 /var/opt/perf/status.mi

4. Debug Process and Analyze Process Core on IBM AIX

4.1 Debug Processes by using proctools

Proctools are similar to Solaris ptools: see Solaris Section about ptools.

Get Process Stack Trace:procstack

Prints Pending and Held Signals for Process:procflags

Display Signal Action and Handlers for Process:procsig

Report stat and fcntl Info for All Open Files in Each Process:procfiles -n pid

Print the Current Working Directory of the Process:procwdx

Display the Process Tree:proctree

4.2 Debug Processes by using trace

The IBM AIX trace tool is conceptually similar to Linux strace.

Use trace Interactively:trace> !anycmd> q

Start trace Asynchronously:trace -a; anycmd; trcstop

Trace the System for 10 Seconds:trace -a; sleep 10; trcstop

Page 44: Debugging Core Files Crash Dumps UNIX Linux

Output Tracing Data to a Specified Log File (Instead of the Default /var/adm/ras/trcfile):trace -a -o /tmp/my_trace_log; anycmd; trcstop

Trace the Process "mydaemon" which is Currently Running:trace -A mydaemon-process-id -Pp

Trace a "cp" Command, Excluding Specific Events - in this case, lockl and unlockl functions (20e and 20f events):trace -a -k "20e,20f" -x "cp /bin/track /tmp/junk"

Trace a "cp" Command, Excluding Specific Events - in this case, lockl and unlockl functions (20e and 20f events) - and Produce a Raw Trace Output File:trace -a -k "20e,20f" -o trc_raw ; cp ../bin/track /tmp/junk ; trcstop

Trace the Hook 234 and the Hooks that will Allow to See the Process Names (in this case trace the event-grou tidhk plus hook 234):trace -a -j 234 -J tidhk

Trace Using One Set of Buffers per Processor.The Command will Produce the Files /var/adm/ras/trcfile, /var/adm/ras/trcfile-0, /var/adm/ras/trcfile-1, etc. up to /var/adm/ras/trcfile-(n-1), where n is the number of processors in the system.trace -aC all

Trace a Program that Starts a Daemon Process And Continue Tracing the Daemon after the Program:trace -X "mydaemon"

Capture PURR, PMC1 and PMC2:trace -ar "PURR PMC1 PMC2"

Format a trace Raw Output as a Report:trcrpt -O "exec=on,pid=on" trc_raw > cp.rpt

Format a trace Raw Output as a Report, Excluding the VMM Activity Detail:trcrpt -k "1b0,1b1" -O "exec=on,pid=on" trc_raw > cp.rpt2

Format a trace Output which Consists of Multiple Files:trcrpt -C all -r trace.out > trace.tr

Reading a trace Report:trace -a -k 20e,20f -o trc_raw

Filter the trace Report Searching for the Event ID for the open() System Call:trcrpt -j | grep -i open

Filter the trace Report by Checking the Event ID 15b:trcrpt -d 15b -O "exec=on" trc_raw

Filter the trace Report to Display Only the open() Subroutines:trcrpt -d 15b -p cp -O "exec=on" trc_raw

To Format a trace Output from a System as a Report on Another System, run:trcnm > trace.nm

Page 45: Debugging Core Files Crash Dumps UNIX Linux

OR Copy Also the /etc/trcfmt of the Traced System (as the Other System could have Different trace Format Stanzas):trcrpt -n trace.nm -t trcfmt_file -o newfile

And then Run trcrpt on the Other System:trcrpt -n trace.nm -o newfile

Generate CPU Report from a trace:curt -i trace.r -o outputfilecurt -i trace.raw -m trace.nm -o outputfilecurt -e -i trace.r -m trace.nm -n gensyms.out -o curt.out curt -s -i trace.r -m trace.nm -n gensyms.out -o curt.outcat curt.out

trace -n -C all -d -j 100,101,102,103,104,106,10C,134,139,200,215,419,465,47F,488,489,48A,48D,492,605,609 -L 1000000 -T 1000000 -afo trace.rawcurt -i trace.raw -n gensyms.out -o curt.outcat curt.out

Generate Input File for curt:HOOKS="100,101,102,103,104,106,10C,119,134,135,139,200,210,215,38F,419,465,47F,488,489,48A,48D,492,605,609" SIZE="1000000" export HOOKS SIZE trace -n -C all -d -j $HOOKS -L $SIZE -T $SIZE -afo trace.rawexport LIBPATH=/usr/ccs/lib/perf:$LIBPATHtrcon ; pthread.app ; trcstopunset HOOKS SIZE ls trace.raw* trace.raw trace.raw-0 trace.raw-1 trace.raw-2 trace.raw-3 trcrpt -C all -r trace.raw > trace.r rm trace.raw* ls trace* trace.r gensyms > gensyms.out trcnm > trace.nm

4.3 Debug Processes by using syscalls

The System Crashes if ipcrm -M sharedmemid is Run after syscalls has been run. Run stem -shmkill instead of Running ipcrm -M to Remove the Shared Memory Segment.

Display System Calls Count:syscalls -startsyscalls -c

Collect System Calls for a Program:syscalls -x /bin/ps

Trace a Process and Log to File:syscalls -o filename -p pid -start

Page 46: Debugging Core Files Crash Dumps UNIX Linux

Simulate the C Code Fragment:output=open("x", 401, 0755);write(output, "hello", strlen("hello"));

Runsyscall open x 401 0755 \; write \$0 hello \#hello

4.4 Debug Processes by using watch

Watch All Files Opened by the "bar" Command:watch -e FILE_Open /usr/lpp/foo/bar -x

Watch All Files Opened by the "bar" Command and Log to File:watch -e FILE_Open /usr/lpp/foo/bar -x -o output_file

Watch the Installation of the Specified Program:watch /usr/sbin/installp xyzproduct

4.5 Debug Processes by using ProbeVue

Start ProbeVue with a Script:probevue myscript.eprobevue <myscript.e

Running ProbeVue on a Program:probevue -X progname -A prog-arguments myscript Format ProbeVue Output as CSV File:probevue -X /usr/bin/tar -A "-cf /dev/null /scratch/bcobb/probevue" ./p2.e | tee t.csv

Example of ProbeVue Script to Monitor a Program:#!/usr/bin/probevuedouble engine(int p1, int p2);@@uft:$1:*:engine:entry{ printf("PID=%d TID=%d PPID=%d PGID=%d UID=%d GID=%d InKernel=%d\n", __pid, __tid, __ppid, __pgid, __uid, __euid, __kernelmode); printf("ProgName=%s errno=%d\n", __pname, __errno); printf("---\n"); stktrace(GET_USER_TRACE,20); printf("+++\n"); stktrace(PRINT_SYMBOLS|GET_USER_TRACE,20); exit;}

4.6 Debug Processes by using truss

Page 47: Debugging Core Files Crash Dumps UNIX Linux

See Solaris Section about trusstruss -deaf -o truss.out program

4.7 Debug Processes by using dbx

See Solaris and Linux Section about dbxdbx exe core

4.8 Analyze a Processes Core by using KDB

Start analyzing a Core:kdb dump

At kdb Prompt, Display Status:>stat

Initial CPU Context:>cpu 1

VMM Error Log:>vmlog

Process Info:>proc *

Get Threads:> thread *

pid Output:>p 3

4.9 Other Debugging Tool on IBM AIX

gcore:Take a snapshot of a process:gcore –o output_filename pid

kill:Kill a process and generate its core dump:kill -SEGV <pid>

Other:Get which Application Created the Core:lquerypv -h core 500 64

List Debugging Commands:bindprocessor -q

Show if 64-bit Kernel is Active:bootinfo –K

Page 48: Debugging Core Files Crash Dumps UNIX Linux

Show wether the Hardware in Use is 32-bit or 64-bit:bootinfo –y

Check libraries loaded by the specified process:ps -u sj1e652a | grep WILoginprocldd 21922

Dump a library looking for API-type exported symbols:dump -Tv bin/orb/shlib/libit_art5_xlc50.so 2>&1dump -Ctv E652/bin/WIReportServer

ps -u sj1e652a | grep WILoginprocldd 21922dump -Tv bin/orb/shlib/libit_art5_xlc50.so 2>&1| grep EXP | c++filt | moretruss -t!all -s!all -u libit_*::CORBA* -p 21922

dump -Ctv E652/bin/WIReportServer | grep FUNC.*GLOB.*9.*dgWICDZ_ips -u sj2e652s -o pid,args | grep WIReportServertruss -t!all -s!all -u a.out::*dgWICDZ_* -p 18846 2>&1 | tee -a out.txtcat out.txt | c++filtpldd 18846truss -t!all -s!all -u libclntsh -p 18846 2>&1 | tee -a out.txt

dump -Hlddprocfileslockstat -IWk example_tnf 24

InterProcess Communication Facilities:ipcs

System Attributes (Entries Marked as "True" are Configurable)lsattr -l sys0 -E

Changes the High/Low water marks for Pending Write I/Os per File:lsattr -l sys0 -a maxpout=9 -a minpout=6

Process Profiling:pprof

Paging Space Statistics:pstat -s

System Variables:pstat -T

Paging Statistics:lsps -a

Display Path Name from iNode Number:ncheck - i <inode>

List Files and grep for the iNode:ls -ail |grep <inode>

Page 49: Debugging Core Files Crash Dumps UNIX Linux

Report Placement of File Blocks:fileplace -pv /unix

Monitor Activity at All FileSystem Levels and Write the Results to /tmp/filemon.log:filemon -o /tmp/filemon.log -O alltrcstop

CPU Profile:tprof

CPU Usage Statistics:netpmon -o /tmp/netpmon.log -O alltrcstop

dkvisnfsvissystatmpvisdkstat

5. Debug Process and Analyze Process Core on IRIX

gcore:Take a snapshot of a process:gcore –o output_filename pid

kill:Kill a process and generate its core dump:kill -SEGV <pid>

parprfstatSystemTaplockstat -IWk example_tnf 24

6. Debug Process and Analyze Process Core on Tru64

gcore:Take a snapshot of a process:gcore –o output_filename pid

kill:Kill a process and generate its core dump:kill -SEGV <pid>

trace trussatom -tool ptrace

Page 50: Debugging Core Files Crash Dumps UNIX Linux

odump -Dllddlockstat -IWk example_tnf 24lockinfo

7. Generate / Analyze a Crash Dump on Solaris

7.1 Save a Crash Dump on a Panic’d System

Check if savecore is Enabled:/etc/init.d/sysetup

Get Core Dump (or Crash Dump) Configurationcoreadm

Save a Crash Dump of the Running Solaris System (without actually rebooting or altering the system):savecore -Lv

OR:savecore -d

Save a Crash Dump (Rebooting the System):reboot -d

OR:uadmin 5 #

OR generate a system panic:adb -k -w /dev/ksyms /dev/mem -> rootdir/W 0 -> ls /

If after enabling core file generation your system still does not create a core file, you may need to change the file-size writing limits set by your operating system:ulimit -aulimit -c unlimitedulimit -H -c unlimited

Check Generated Core Dump on Solaris:ls -lrt /var/crash/sunbkl01cd /var/crash/sunbkl01pstack vmcorefile vmcorestrings vmcore

7.2 Setup a System to Save a Crash Dump

Disable / Enable the Saving if Crash Dumps:

Page 51: Debugging Core Files Crash Dumps UNIX Linux

dumpadm -ndumpadm -y

Enable Compressed Crash Dump (Default):dumpadm -z on -y

Enable Uncompressed Crash Dump (it Uses Mush Space):dumpadm -z off -y

Check dumpadm Configurationmore /etc/dumpadm.confdumpadm

Setup System for Full Crash Dump: dumpadm -c all -d /dev/md/dsk/d201 -s /var/crash/vasdbs02

OR Setup System for Dumping Kernel Memory Pages Only (it Saves Space and Time, but it's Less Accurate and Less Useful for Debugging a Problem):dumpadm -c kernel -d /dev/md/dsk/d201 -s /var/crash/vasdbs02

OR Setup System for Dumping Kernel Memory Pages, and the Memory Pages of the Process whose Threads was Currently Executing on the CPU on which the Crash Dump was Initiated. If the Thread executing on that CPU is a Kernel Thread Not Associated with any User Process, Only Kernel Pages will be Dumped:dumpadm -c curproc -d /dev/md/dsk/d201 -s /var/crash/vasdbs02more /etc/dumpadm.conf

Reconfigure the Dedicated Dump Device and Directory on which Crash Dumps will be Saved:dumpadm -d /dev/dsk/c0t2d0s2 -s /var/crash/server_name

OR (on a System using SVM):dumpadm -d /dev/md/dsk/d201 -s /var/crash/vasdbs02

OR Reconfigure the Dedicated Dump Device on Swap:dumpadm -d swap

Restart Dumpadm Service and Check:svcadm restart svc:/system/dumpadm:defaultsvcs -a | grep -i dumpadm

To setup a method to automatically save crash dump on the older versions of Solaris OS, or on servers where dumpadm is not installed, you can create a script /etc/init.d/sysetup with the following content:if [ ! -d /var/crash/´uname -n´ ] then mkdir -p /var/crash/´uname -n´fiecho 'checking for crash dump...\c ' savecore /var/crash/´uname -n´echo ' '

Page 52: Debugging Core Files Crash Dumps UNIX Linux

7.3 Crash Dump Analysis on Solaris by using MDB

The Solaris Modular Debugger is a powerful debugger that replaces the adb and crash utilities that you can still find on Solaris systems beside the mdb.

Access the the crash dump directory and check files:cd /var/crash/syste_namels –lrtpstack vmcorefile vmcorestrings vmcore

Invoke the mdb Debugger:mdb -k unix.0 vmcore.0

OR:mdb -k 0

At mdb PromptGet the time of the crash:*time-(*lbolt%0t100)=Y::time/Y

Get the core informations:::coreinfo

Get crash informations:::system::status

Display the panic string:::panicinfo

Display the stack trace:::stack

Display the message buffer (containing the panic string):::msgbuf

Display the crash log:::crashlog

Get CPU informations at the time of the crash:::cpuinfo –v

Get semaphores informations at the time of the crash:::ipcs::dnlc

Display the thread list at the time of the crash:::threadlist::tlist killed::tlist pctcpu

Page 53: Debugging Core Files Crash Dumps UNIX Linux

Show the kernel memory structures and the kernel memory log:::kmastat::kmalog::ksemid::kshmid::kstat xck filename::mdump/rd\* -P::nvlist::slist

Show memory informations at the moment of the crash:::memstat::memerr::meminfo tree process::meminfo user command::meminfo -m user

Show symbols and processes informations (including the processes tree):::nm::symbols::ps::ps -z::pgrep processname::ptree::proc

Show open files at the moment of the crash:::pfiles

Display the callouts and the memory walkers:::callout::walkers

Display the CPU cycle informations:::cycinfo –v

Display disk, slices, partitions table, SVM and ZFS informations:::vfstab::svm -i::svm [-s <set>] [-d <devnum>]::zfs

Get pools informations:::pool

Get NFS and shared filesystems informations:::nfs::autofs

Get file lists at the time of the crash:::findfiles

Get cluster informations:::clust

Page 54: Debugging Core Files Crash Dumps UNIX Linux

Get zone informations:::zone

Get informations about the previously selected structure:::whatis –P

Get network interfaces informations at the time of the crash:::ifconf::netstat

Display memory dump informations:::pkma -fslL::scatenv mdump_compression

Get the alternate CPU walk and follow it:::scatenv alternate_cpu_walkffffffffaaaf8760::whatis30018ca2d20::print -t kthread_t2a101423cc0::findstack300423b2000::cpuinfo -v::walk thread::walk thread |::findstack::walk cpu |::print cpu_t cpu_thread |::print kthread_t t_pri0x3000b270078::print -t proc_t p_user.u_psargscpu0::print cpu_t cpu_disp |::print disp_t

First or Second Address in pstackff21fca4::dis

Second Address in pstack0003cb08::nmadd -f badfunc

Second Address in pstack and End Address in pstack:0003cb08::nmadd -f -e 00020dc0 badfunc0003cb08::dis

Get the registers informations:::regs

Display memory leaks and walk the kernel memory log to find leaks:::findleaks::walk kmem_log | ::bufctl ! grep tleakd4db0300::whatis0x0000000010035a94::what is -av::walk kmem_log | ::bufctl -a d4db0300d4db0300::kgrep | ::whatis -av80506c0::nmadd -f -e 80506da badfunc

Quit the debugger:$q

An alternate method to invoke the debugger is to pass echoed commands by pipe:echo "*panicstr/s" |mdb -k unix.0 vmcore.0echo "*cmm_dbg_buf/s" |mdb -k unix.0 vmcore.0 > ./cmm_dbg_buf.outecho "$<threadlist" |mdb -k unix.0 vmcore.0 > ./threadlist.out

Page 55: Debugging Core Files Crash Dumps UNIX Linux

OR:fmdump -v

7.4 Service Tool Bundle Service Crash Analysis Tool

Download the Oracle Solaris Service Tool Bundle from the support.oracle.com web portal.

Untar the Package, Access the Directory and Install Service Tool Bundle and Choose the Components to Install (you must Select Service Crash Analysis Tool):./install_stb.sh

Execute the Service Crash Analysis Tool (scat):cd /var/crash/cldbrm2a/opt/SUNWscat/bin/scat --scat_explore -a -v unix.1 vmcore.1

OR:/opt/SUNWscat/bin/scat --scat_explore -a -v 1

Access the Directory Created by scat and Analyze the Files:cd $SCAT_EXPLORE_DATA_DIRmore panic.outmore panic_thread.outmore panic_buf.outmore analyze.outmore coreinfo.outmore cpu-L.outmore dev

An alternate method to use SCAT is to access its Prompt:/opt/SUNWscat/bin/scat 0

Then, at the scat Prompt, analyze the crash dump:SolarisCAT(vmcore.1/11X)> analyze

Get the thread list:SolarisCAT(vmcore.1/11X)> threadlist

Get CPU informations:SolarisCAT(vmcore.1/11X)> cpuinfo -v

Get kernel tunables:SolarisCAT(vmcore.1/11X)> tunables

Get the dispatch queues:SolarisCAT(vmcore.1/11X)> dispq

Get ZFS informations:SolarisCAT(vmcore.1/11X)> zfs –e

Get ZFS informations:SolarisCAT(vmcore.1/11X)> zfs arc

Page 56: Debugging Core Files Crash Dumps UNIX Linux

Run Sanity Checks:scat --sanity_checks vmcore.0

scat can Include an optional module to retrieve the type information from.

List Modules:ctf

Dump qlc logs (fp_logq or ssfcp_logq):qlcfc fplog|ssfcplog

Simplify Decoding ddi_devid_t (impl_devid_t) Structures in the Kernel and Display the String Representation of the devid:dev id

Display the Threads that have an Affinity set for a CPU (Specify <cpu> to Show only Threads with Affinity for that <cpu>): tlist affinitiy <cpu>

7.5 Crash Dump Analysis on Solaris by using ADB

Access the the crash dump directory and check files:cd /var/crash/syste_namels –lrtpstack vmcorefile vmcorestrings vmcore

Invoke the debugger:adb -k unix.0 vmcore.0

OR:adb –k 0

At adb PromptDisplay the message buffer:$<msgbufmsgbuf+14smsgbuf+10/s

Get the core informations:$>coreinfo

Get crash informations:$>system$>status

Display the panic string:$>panicinfo*panicstr/s

Page 57: Debugging Core Files Crash Dumps UNIX Linux

Show the crash log:$>crashlog

Get the thread list:$< threadlist

Check the status:$>status

Get the system crash time:$>time/Y

Get the boot time:$>lbolt/X

Get server’s informations:$<utsname$<hw_provider/s$<architecture/s$<srpc_domain/s

Display stack trace:$<stack$<stacktrace

Display stack calls:$<stackcalls

Display stack registers:$<stackregs

Stack traceback:<sp$<stacktrace

Check the root device:rootfs$<bootobj

Check the swapfile device:swapfile$<bootobj dumpfile$<bootobj

Display the CPU cycle informations:$>cpuinfo –v

Get CPUs:$<cpus

Get process on CPU:$<proconcpu

Get processes running at the moment of the crash:$<proc

Get modules:

Page 58: Debugging Core Files Crash Dumps UNIX Linux

$<modules

Show open files at the moment of the crash:::pfiles

Display the callouts and the memory walkers:$>callout$>walkers

Get the kernel memory structures:$>kmastat

Show memory informations at the moment of the crash:$>memstat$>memerr$>meminfo tree process

Show kernel memory segments:$<seglist

Get ipc informations:ipcaccess/10i

Get segment map:$>segkmap/J

Show kernel address space:$>kas

Show queues:$<queue

Get filesystem list:$<vfslist

Quit the debugger:$>q

An alternate method to invoke the debugger is to pass echoed commands by pipe:echo 'msgbuf$<msgbuf' | adb -k unix.0 vmcore.0echo 'msgbuf,100/s' | adb -k unix.0 vmcore.0echo '$c' | adb -k unix.0 vmcore.0echo "<fp$<stackcalls" | adb -k unix.0 vmcore.0echo "<fp$<stack" | adb -k unix.0 vmcore.0echo "<fp$<stackregs" | adb -k unix.0 vmcore.0echo "<fp$<stacktrace" | adb -k unix.0 vmcore.0

7.6 Crash Dump Analysis on Solaris by using Crash

The crash tool is installed as part of the Solaris operating system.The binary is located in the /usr/sbin.

Page 59: Debugging Core Files Crash Dumps UNIX Linux

Access the the crash dump directory and check files:cd /var/crash/syste_namels –lrtpstack vmcorefile vmcorestrings vmcore

Invoke crash tool and output to file:crash –d vmcore.0 -n unix.0 –w /tmp/output_filename

Invoke crash tool to use it by the prompt:crash –d vmcore.0 -n unix.0

OR:crash –d 0

At the crash Prompt

Get the core informations:>coreinfo

Get crash informations:>system>status

Display the panic string:>panicinfo*panicstr/s

Show the crash log:>crashlog

Show processes running at the moment of the crash:>proc>p -e>p -l

Get the thread list at the moment of the crash:>threadlist

Check the status:>status

Get CPU informations at the moment of the crash:>cpuinfo>cpuinfo -v

Show the buffer:>buf

Show the queues:>queue

Get kernel memory structures informations at the time of the crash:>kmastat

Page 60: Debugging Core Files Crash Dumps UNIX Linux

Get memory informations at the time of the crash:>meminfo>memerr

Quit the crash tool: <CTRL><D>

7.7 Crash Dump Analysis on Solaris by using ACT

The ACT tool analyzes a system kernel dump and generates a human-readable text summary.It’s shipped with all the Solaris media installation or with the Service Tool Bundle.To check if it is installed:pkginfo | grep CTEact

Access the the crash dump directory and check files:cd /var/crash/syste_namels –lrtpstack vmcorefile vmcorestrings vmcore

To invoke ACT and output core file to seperate files in /tmp/dir:act -d /var/crash/hostname/vmcore.0 -s /tmp/dir/

OR to invoke ACT and output core file to act_out file:act -d /var/crash/hostname/vmcore.0 > /tmp/act_out

OR to invoke ACT and output on live server to screen:act –l

When ACT is invoked to split the core file into the specified directory it creates the following files:biowaitgetblkmodulesmsgbufmutexrwlockthreadssystemsummarysunsolve

7.8 Other Crash Dump Analysis Tools on Solaris

On Solaris you can use some common binaries and commands to analyze a crash dump.

Get network status at the time of the crash:

Page 61: Debugging Core Files Crash Dumps UNIX Linux

netstat unix.0 vmcore.0

Get NFS status at the time of the crash:nfsstat -n unix.0 vmcore.0

Get ARP tables at the time of the crash:arp -a unix.0 vmcore.0

Get IPC status at the time of the crash:/usr/sbin/ipcs –C vmcore.0 unix.0

8. Generate / Analyze a Crash Dump on HP-UX

8.1 Crash Dump Analysis by using KWDB

Check the Crash Dump Directory:ls -lrt /var/adm/crash/c*

Check the INDEX file and /etc/shutdownlog file as they contain the "panic" statement.cat INDEXcat /etc/shutdownlog

Create the /etc/shutdownlog file if it does not exist:touch /etc/shutdownlog

If there's No Dump, Re-Save it:savecrash -vr /tmp

Verify that kwdb (preferred) or q4 is Installed and Loaded:swlist -l fileset | grep -i KWDBswlist -l file | grep contribswlist -l fileset | grep -i q4

If KWDB is not installed, you can download the HP official depot for your server’s HP-UX version and architecture from here (you need an HP AllianceONE account with appropriate privileges).Then upload the depot to the server and uncompress it:gunzip KWDB_3.xxxx_depot.gz

Install the depot package for Itanium-based & PA-RISC systems: swinstall -s /KWDB_3.tape_depot KWDB_3

OR for PA-RISC system:swinstall -s /kwdb.pa.depot KWDBPA_3

Analyze the Crash Dump by Using kwdb:cd /var/adm/crash/crash.#ls –lrtkwdb -q4 /var/adm/crash/crash.5

Page 62: Debugging Core Files Crash Dumps UNIX Linux

At the kwdb PromptCheck the panicstring:(kwdb) examine panicstr using s

Display stack trace with pc and sp (PA-RISC only):(kwdb) pc sp

Get breakpoint info:(kwdb) info breakpoints(kwdb) i b

Trace event 0:(kwdb) trace event 0

Trace event 0 with input, local and output registers:(kwdb) trace -args event 0

Load structures:(kwdb) load struct utsname from &utsname(kwdb) print –t

Print console message buffer:(kwdb) examine &msgbuf+8 using s

Print the system crash date/time:(kwdb) examine &time using Y

How long had the system been up before the crash:(kwdb) ticks_since_boot/hz

System load average at the moment of the crash:(kwdb) examine &avenrun using 3F(kwdb) examine &real_run using 3F

What command was running the specified process:(kwdb) load struct proc from 0xb0d240(kwdb) examine p_cmnd using s(kwdb) load struct proc from 0x42234040(kwdb) print -xt p_cmnd(kwdb) examine 0x41e4db40(kwdb) print p_comm

How was the kernel built:(kwdb) examine &_makefile_cflags using s

Load the part of the crash event table that contains valid entries and trace them:(kwdb) load crash_event_t from &crash_event_table until crash_event_ptr max 100loaded 4 struct crash_event_table_structs as an array (stopped by “until” clause)(kwdb) trace pile

Load the processor info table and trace every processor (HP-UX v11.11):(kwdb) load mpinfo_t from mpproc_info max nmpinfoloaded 4 struct mpinfos as an array (stopped by max count)(kwdb) trace pile

Page 63: Debugging Core Files Crash Dumps UNIX Linux

OR (post-HP-UX v11.11 kernels):(kwdb) load mpinfou_t from &spu_info max nmpinfo(kwdb) pileon mpinfo_t from pikptr(kwdb) trace pile

Load the processor information table and trace every processor:(kwdb) load mpinfou_t from &spu_info max nmpinfoloaded 1 union mpinfou as an array (stopped by max count)(kwdb) pileon mpinfo_t from pikptrloaded 1 struct mpinfo(kwdb) trace pile

Load the process table and trace the stacks:(kwdb) load struct proc from proc_list max nproc next(kwdb) trace pile

Load crash event:(kwdb) load crash_event_t from &crash_event_table until crash_event_ptr max 100(kwdb) print cet_hpa %#x cet_event

Trace event 1:(kwdb) trace event 1

Trace event 0 with input, local and output registers:(kwdb) trace -args event 0

Load structures:(kwdb) load struct utsname from &utsname(kwdb) print –t

Check threads:(kwdb) load kthread_t from kthread max nkthread(kwdb) hist(kwdb) load kthread_t from kthread_list max nkthread next kt_factp(kwdb) hist(kwdb) keep kt_cntxt_flags & TSRUNPROC

Display stack trace for structures from the current pile for process, processor, thread and crash event structures:(kwdb) trace pile(kwdb) print -tx kt_stat kt_cntxt_flags kt_flag kt_spu addrof kt_procp(kwdb) addrof kt_procp

Check running processes (at the time the panic occurred):(kwdb) runningprocs

Display stack trace for the process at addr:(kwdb) trace process at 7032300014

Trace CPU3, its threads, spinlocks, calls, etc…:(kwdb) trace .v processor 3

Check the state of the processors:(kwdb) load mpinfo_t from mpproc_info max nmpinfo(kwdb) load mpinfou_t from &spu_info max nmpinfo

Page 64: Debugging Core Files Crash Dumps UNIX Linux

(kwdb) pileon mpinfo_t from pikptr(kwdb) call it mpinfo(kwdb) print indexof addrof threadp curstate(kwdb) exam &mp_avenrun for nmpinfo using 3F(kwdb) print indexof addrof held_spinlock spinlock_depth(kwdb) load lock_t from 0x129a4c0(kwdb) print -x sl_owner sl_lock_caller sl_unlock_caller(kwdb) exam sl_lock_caller using a(kwdb) exam sl_unlock_caller using a

Recall mpinfo (Make a pile which is specified by mpinfo):(kwdb) recall mpinfo(kwdb) print indexof spu_state(kwdb) print indexof last_idletime last_tsharetime(kwdb) lbolt(kwdb) recall mpinfo(kwdb) print mp_rq.nready_free mp_rq.nready_locked

Check the per-processor run queues:(kwdb) print -t | grep mp_rq(kwdb) print -t | grep mp_rq > mprq.out(kwdb) load rtsched_info_t from &rtsched_info(kwdb) print rts_nready rts_bestq rts_qp rts_numpri(kwdb) print -t(kwdb) print addrof kt_lastrun_time kt_wchan | sort -k 3n,3 | uniq -c -f2 | grep -v “^ 1” | sort

Trace the specified thread:(kwdb) trace thread at 1532338064(kwdb) load unwindDesc_t from &$UNWIND_START until &$UNWIND_END max 100000(kwdb) maint info unwind panic(kwdb) examine &_makefile_cflags using s

Check kernel memory writes and log:(kwdb) kmem_writes(kwdb) load kmem_log_t from &kmem_log max kmem_log_slots

If the crash dump analysis reveal an hardware issue, you can find the associated tombstone for the system.To save a tombstone:/usr/sbin/diag/contrib/pdcinfo

Check the tombstone:cd /var/tombstones/ls –lrtmore ts99

Extract the PIM informations:cstmcstm>mapcstm>sel dev 25cstm>infocstm>infologEnter Done, Help, Print, SaveAs, or View: [Done] SA

Page 65: Debugging Core Files Crash Dumps UNIX Linux

cstm>quitls -l /tmp/pim.HPMC.16Nov03

8.2 Remote Crash Dump Analysis

kwdbcr -helpkwdbcr /var/adm/crash.5

kwdb -q4 [-m] vmunix remote_system:port_number | crash_path in remote system>kwdb -q4 [-m] vmunix (kwdb) target crash remote_system:port_number | crash_path in remote system>

more /var/opt/kwdb/kwdbcr.logkwdbcr -d -l logfile

8.3 Crash Dump Analysis by using Q4

Q4 is a crash dump analysis tool shipped with HP-UX OS installation media.It can work alone or in combination with KWDB.

Check the Crash Dump Directory:ls -lrt /var/adm/crash/*

Check the INDEX file and /etc/shutdownlog file as they contain the "panic" statement.cat INDEXcat /etc/shutdownlog

Create the /etc/shutdownlog file if it does not exist:touch /etc/shutdownlog

If there's No Dump, Re-Save it:savecrash -vr /tmp

Verify that kwdb (preferred) or q4 is Installed and loaded:swlist -l fileset | grep -i KWDBswlist -l fileset | grep -i q4swlist -l file | grep contribtype q4

If Q4 is not installed, you can install it from the HP-UX INSTALL media.First, you have to check the following patches are installed on the corresponding OS versions:HP-UX v10.20: PHCO_20261HP-UX v11.00: PHCO_20262HP-UX v11.11: PHCO_25723

To check wether the patch is installed, run the following command by substituting the “xxxxx” with the ID of the corresponding patch you’re searching for:/usr/sbin/swlist -l product | grep PHCO_xxxxx

Page 66: Debugging Core Files Crash Dumps UNIX Linux

If needed, you can download the patch from the following locations:For v10.[12]0 versions: ftp://us-ffs.external.hp.com/hp-ux_patches/s700_800/10.X/PHCO_20261

For v11.0 versions: ftp://us-ffs.external.hp.com/hp-ux_patches/s700_800/11.X/PHCO_20262

For v11.11 versions: ftp://us-ffs.external.hp.com/hp-ux_patches/s700_800/11.X/PHCO_25723

swlist -l fileset -s /cdrom | grep Q4OS-Core.Q4 B.10.10 HP-UX Crash Dump Debugger for PA-RISC systems

Select and load it if not loaded:swinstall -vs /<CD-ROM mount point> OS-Core.Q4

Prepare dump toolsFor HP-UX 10.20 through 11.11:

/usr/contrib/bin/q4prep –p

For HP-UX 11.20 and above:/usr/contrib/Q4/bin/q4prep –p

For HP-UX 10.10Uncompress and untar the Q4Lib:uncompress /usr/contrib/Q4/lib/Q4Lib.tar.Ztar -xf /usr/contrib/Q4/lib/Q4Lib.tar

Copy the q4rc.pl sample file to the /tmp:cp /usr/contrib/Q4/lib/q4lib/sample.q4rc.pl /tmp/.q4rc.pl

Once the dump tools are installed and prepared, access the crash dump directory and decompress the dump:cd /var/adm/crash/crash.5ls –lrtgunzip vmunixstrings vmunix | morefile vmunix

Set the Environment:. /usr/contrib/Q4/bin/set_env

Make a check of vmunix (for HP-UX 11.20 and above):/usr/contrib/Q4/bin/q4pxdb vmunix –s status vmunixpxdb -s status ./vmunix

OR (for HP-UX 10.20 through 11.11): /usr/contrib/bin/q4pxdb –s status vmunix

Start analyzing the dump (for HP-UX 11.20 and above):/usr/contrib/Q4/bin/q4pxdb vmunix

OR (for HP-UX 10.20 through 11.11): /usr/contrib/bin/q4pxdb vmunix

Page 67: Debugging Core Files Crash Dumps UNIX Linux

Get panic info and put the output on a file:last reboot > reboot.out

Get installed patches list and put it on a file:swlist -l product | grep -i PH > patches.out

Access the core directory:cd /var/adm/crash/core.0ls –lrt

Analyze the Crash Dump by Using Q4 (for HP-UX 11.20 and above):/usr/contrib/Q4/bin/q4 -p .

OR (for HP-UX 10.20 through 11.11): /usr/contrib/bin/q4 -p .

At the q4 Promptinclude the analyze.pl script to add more analyzing features:q4> include analyze.pl

Analyze the dump and put the output in a file:q4> run Analyze AU > ana.out

Check the panic cause and put the output in a file:q4> run WhatHappened > what.out

If it happened an hang, check the hang cause and put the output in a file:q4> run WhatHappened -HANG > whath.out

Exit q4 Prompt:q4> exit

If the crash dump analysis reveal an hardware issue, you can find the associated tombstone for the system.To save a tombstone:/usr/sbin/diag/contrib/pdcinfo

Check the tombstone:cd /var/tombstones/ls –lrtmore ts99

Extract the PIM informations:cstmcstm>mapcstm>sel dev 25cstm>infocstm>infologEnter Done, Help, Print, SaveAs, or View: [Done] SAcstm>quitls -l /tmp/pim.HPMC.16Nov03

Then analyze the Following Files:more patches.outmore /etc/shutdownlog

Page 68: Debugging Core Files Crash Dumps UNIX Linux

more /var/tombstones/ts* (if they exist and/or if HPMC was detected)more /var/adm/syslog/OLDsyslog.log (if the dump was due to a hang)more ana.outmore what.outmore whath.outmore reboot.outmore crashinfo.out

8.4 Crash Dump Analysis by using KWDB Q4 Mode

KWDB supports a superset of commands provided by the crash dump analysis tool, Q4, which extends its functionalities.

Check the Crash Dump Directory:ls -lrt /var/adm/crash/*

Check the INDEX file and /etc/shutdownlog file as they contain the "panic" statement.cat INDEXcat /etc/shutdownlog

Create the /etc/shutdownlog file if it does not exist:touch /etc/shutdownlog

If there's No Dump, Re-Save it:savecrash -vr /tmp

Verify that kwdb (preferred) or q4 is Installed and loaded:swlist -l fileset | grep -i KWDBswlist -l fileset | grep -i q4swlist -l file | grep contribtype q4

If KWDB is not installed, you can download the HP official depot for your server’s HP-UX version and architecture from here (you need an HP AllianceONE account with appropriate privileges).Then upload the depot to the server and uncompress it:gunzip KWDB_3.xxxx_depot.gz

Install the depot package for Itanium-based & PA-RISC systems: swinstall -s /KWDB_3.tape_depot KWDB_3

OR for PA-RISC system:swinstall -s /kwdb.pa.depot KWDBPA_3

If Q4 is not installed, then follow the indications in the above section 8.3.

Access the crash dump directory and analyze the Crash Dump by Using kwdb / q4:cd /var/adm/crash/crash.#ls –lrtkwdb -q4 -p -m .

OR at the kwdb Prompt, Activate the q4 Mode :

Page 69: Debugging Core Files Crash Dumps UNIX Linux

(kwdb) set kwdb q4 on

You run “set kwdb q4 off” at the q4 Prompt to disable q4 support.

At the q4 PromptCheck the events that occurred immediately before and during the panic and log to file:q4> run WhatHappened > what.out

If there’s a suspect an hang occurred, check the panic events by running:q4> run WhatHappened -HANG > whath.out

Analyze the dump and log output to file:q4> run Analyze AU > ana.out

Check the panicstring:q4> examine panicstr using s

Display stack trace with pc and sp (PA-RISC only):q4> pc sp

Get breakpoint info:q4> info breakpointsq4> i b

Trace event 0:q4> trace event 0

Trace event 0 with input, local and output registers:q4> trace -args event 0

Load structures:q4> load struct utsname from &utsnameq4> print –t

Print console message buffer:q4> examine &msgbuf+8 using s

Print the system crash date/time:q4> examine &time using Y

How long had the system been up before the crash:q4> ticks_since_boot/hz

System load average at the moment of the crash:q4> examine &avenrun using 3Fq4> examine &real_run using 3F

What command was running the specified process:q4> load struct proc from 0xb0d240q4> examine p_cmnd using sq4> load struct proc from 0x42234040q4> print -xt p_cmndq4> examine 0x41e4db40q4> print p_comm

Page 70: Debugging Core Files Crash Dumps UNIX Linux

How was the kernel built:q4> examine &_makefile_cflags using s

Load the part of the crash event table that contains valid entries and trace them:q4> load crash_event_t from &crash_event_table until crash_event_ptr max 100loaded 4 struct crash_event_table_structs as an array (stopped by “until” clause)q4> trace pile

Load the processor info table and trace every processor (HP-UX v11.11):q4> load mpinfo_t from mpproc_info max nmpinfoloaded 4 struct mpinfos as an array (stopped by max count)q4> trace pile

OR (post-HP-UX v11.11 kernels):q4> load mpinfou_t from &spu_info max nmpinfoq4> pileon mpinfo_t from pikptrq4> trace pile

Load the processor information table and trace every processor:q4> load mpinfou_t from &spu_info max nmpinfoloaded 1 union mpinfou as an array (stopped by max count)q4> pileon mpinfo_t from pikptrloaded 1 struct mpinfoq4> trace pile

Load the process table and trace the stacks:q4> load struct proc from proc_list max nproc nextq4> trace pile

Load crash event:q4> load crash_event_t from &crash_event_table until crash_event_ptr max 100q4> print cet_hpa %#x cet_event

Trace event 1:q4> trace event 1

Load structures:q4> load struct utsname from &utsnameq4> print –t

Check threads:q4> load kthread_t from kthread max nkthreadq4> histq4> load kthread_t from kthread_list max nkthread next kt_factpq4> histq4> keep kt_cntxt_flags & TSRUNPROC

Display stack trace for structures from the current pile for process, processor, thread and crash event structures:q4> trace pileq4> print -tx kt_stat kt_cntxt_flags kt_flag kt_spu addrof kt_procpq4> addrof kt_procp

Check running processes (at the time the panic occurred):q4> runningprocs

Page 71: Debugging Core Files Crash Dumps UNIX Linux

Display stack trace for the process at addr:q4> trace process at 0x41978040

Trace CPU3, its threads, spinlocks, calls, etc…:q4> trace –v processor 3

Check the state of the processors:q4> load mpinfo_t from mpproc_info max nmpinfo

Recall mpinfo (Make a pile which is specified by mpinfo):q4> recall mpinfoq4> print indexof spu_stateq4> print indexof last_idletime last_tsharetimeq4> lboltq4> recall mpinfoq4> print mp_rq.nready_free mp_rq.nready_locked

Check the per-processor run queues:q4> print -t | grep mp_rq > mprq.outq4> load rtsched_info_t from &rtsched_infoq4> print rts_nready rts_bestq rts_qp rts_numpriq4> print -tq4> print addrof kt_lastrun_time kt_wchan | sort -k 3n,3 | uniq -c -f2 | grep -v “^ 1” | sort

Trace the specified thread:q4> trace thread at 1532338064q4> load unwindDesc_t from &$UNWIND_START until &$UNWIND_END max 100000q4> maint info unwind panicq4> examine &_makefile_cflags using s

Check kernel memory writes and log:q4> kmem_writesq4> load kmem_log_t from &kmem_log max kmem_log_slots

Exit the q4 Prompt:q4> exit

Run the crashinfo utility, if you have it. It may be in /usr/local/bin or /opt/sfm/tools/ - search it if you don’t find it:find / -type f | grep crashinfo

Run crashinfo and log the output to file:/opt/sfm/tools/crashinfo > crashinfo.out

OR:/usr/local/bin/crashinfo > crashinfo.out

OR:/opt/sfm/tools/crashinfo -continue | tee crash-43.log

OR pointing the crash .# file:/opt/sfm/tools/crashinfo /var/adm/crash/crash.5 > crashinfo.out

Page 72: Debugging Core Files Crash Dumps UNIX Linux

If the crash dump analysis reveal an hardware issue, you can find the associated tombstone for the system.To save a tombstone:/usr/sbin/diag/contrib/pdcinfo

Check the tombstone:cd /var/tombstones/ls –lrtmore ts99

Extract the PIM informations:cstmcstm>mapcstm>sel dev 25cstm>infocstm>infologEnter Done, Help, Print, SaveAs, or View: [Done] SAcstm>quitls -lrt /tmp/pim.HPMC.16Nov03

Then analyze the Following Files:more patches.outmore /etc/shutdownlogmore /var/tombstones/ts* (if they exist and/or if HPMC was detected)more /var/adm/syslog/OLDsyslog.log (if the dump was due to a hang)more ana.outmore what.outmore whath.outmore reboot.outmore crashinfo.out

To install crashinfo:

crashinfo is part of the SFM (System Fault Management) bundle.There are 2 versions of crashinfo – crashinfo-a-2.exe (64-bit PA2.0) and crashinfo-a-i.exe (IA64). The 64-bit PA2.0 version can be run on both PA and IA64 systems, and analyze both PA2.0 and IA64 crashdumps. The IA64 version will only run on IA64 systems, but can analyze crashdumps from both IA64 and PA2.0 systems. For performance reasons you may wish to use the IA64 version when running on IA64 systems.

Check if crashinfo is installed on the system:ls -lrt /opt/sfm/tools

Download crashinfo /var/adm/crash/depot/SFM-CORE/MISC_TOOLS/opt/sfm/tools/crashinfo-a-2.exe

OR:/opt/sfm/tools/crashinfo-a-i.exe

To run crashinfo:/opt/sfm/tools/crashinfo > crashinfo.out

/usr/ccs/bin/pxdb -s status ./vmunix

Page 73: Debugging Core Files Crash Dumps UNIX Linux

/usr/ccs/bin/pxdb ./vmunix

8.5 Crash Dump Analysis by using HP WDB / GDB

The HP Wildebeest Debugger (WDB) is an HP-supported implementation of the Open Source GNU debugger (GDB).

HP WDB / GDB can be used to debug / monitor a process, but it mostly used to analyze crashed processes core files and system’s crash dumps.

To analyze a system crash dump follow the steps below.

Check the Crash Dump Directory:ls -lrt /var/adm/crash/c*

Check the INDEX file and /etc/shutdownlog file as they contain the "panic" statement.cat INDEXcat /etc/shutdownlog

Create the /etc/shutdownlog file if it does not exist:touch /etc/shutdownlog

If there's No Dump, Re-Save it:savecrash -vr /tmp

Check if HP WDB is installed:swlist -l fileset | grep -i wdb

If HP WDB is not installed, you can download the latest version (6.3) for your HP-UX version and architecture from here: you need an HP AllianceONE account with appropriate provileges.

Upload the depot file onto the server’s /tmp directory, access the directory and decompress it:cd /tmp gunzip hpwdb.xxxx.xxxx.depot.gz

Install the depot:swinstall –s hpwdb.xxxx.xxxx.depot/*

The main path are:/opt/langtools/wdb/opt/langtools/gdb/opt/langtools/bin

Before analyzing a process core file, check it:file corefile_namestrings corefile_name

Check if it’s truncated:elfdump -o -S core

Page 74: Debugging Core Files Crash Dumps UNIX Linux

If the core file is truncated at 2GB size, maybe the system does not support creating files over that size on the filesystem on which the crash dumps are saved. You can check the syslog file to see if the system warns it could not complete saving the file.If that’s the problem, the you can enable support for files over 2GB for the specified filesystem:fsadm -o largefiles /filesystem_name

To start analyze the core dump:gdb -c core

OR:gdb

At gdb Prompt:(gdb) core core

For commands and details refer to section “3.3 Debug Processes and Core Files by using HP WDB / GDB”.

8.6 Crash Dump Analysis by using adb

Check the Crash Dump Directory:ls -lrt /var/adm/crash/c*

Check the INDEX file and /etc/shutdownlog file as they contain the "panic" statement.cat INDEXcat /etc/shutdownlog

Create the /etc/shutdownlog file if it does not exist:touch /etc/shutdownlog

If there's No Dump, Re-Save it:savecrash -vr /tmp

Access the crash dump directory and start analyzing the dump (change crash.5 with the name of the crash.# directory):cd /var/adm/crash/crash.5ls –lrtgunzip vmunix.gzstrings vmunix | morefile vmunixadb -m vmunix .

OR without accessing the crash dump directory:adb -m /var/adm/crash/crash.5/vmunix /var/adm/crash/crash.5msgbuf+8/s

At adb PromptDisplay the message buffer:$<msgbufmsgbuf+14s

Page 75: Debugging Core Files Crash Dumps UNIX Linux

msgbuf+10/s

Get the core informations:$>coreinfo

Get crash informations:$>system$>status

Display the panic string:$>panicinfo*panicstr/s

Show the crash log:$>crashlog

Get the thread list:$< threadlist

Check the status:$>status

Quit the debugger:$>q

9. Generate / Analyze a Crash Dump on Linux

9.1 Enable Saving Crash Dump by using kexex-tools

Check the Presence of kdump Tool:yum search kexec-toolschkconfig --list | grep kdumpmore /etc/kdump.conf

OR:/etc/init.d/kdump status

If necessary, Add the Line to Yum Repository on Red Hat:vi /etc/yum.repos.d/rhel-debuginfo.repo baseurl=ftp://ftp.redhat.com/pub/redhat/linux/enterprise/$releasever/en/os/$basearch/Debuginfo/

Enable Repository:yum install --enablerepo rhel-debuginfo httpd-debuginfo

OR for CentOS:vi /etc/yum.repos.d/centos-debuginfo.repo baseurl=http://debuginfo.centos.org/$releasever/$basearch/

Page 76: Debugging Core Files Crash Dumps UNIX Linux

Enable Repository:yum install --enablerepo centos-debuginfo httpd-debuginfo

Install kexec-tools:yum install kexec-tools

Check or Edit /etc/kdump.conf File According to your Needs:vi /etc/kdump.confmore /etc/sysconfig/kdump

Backup and Edit /boot/grub/grub.conf and Append to the End of the Kernel Line "crashkernel=128M@16M":cp /boot/grub/grub.conf /boot/grub/grub.conf.bkpvi /boot/grub/grub.conf

OR:cp /boot/grub/menu.lst /boot/grub/menu.lst.bkpvi /boot/grub/menu.lstkernel /boot/vmlinuz-2.6.18-128.1.16.el5 ro root=LABEL=/ rhgb quiet crashkernel=128M@16M

Enable kdump Service:chkconfig kdump onchkconfig kdumpchkconfig --list | grep kdump

OR:/etc/init.d/kdump start/etc/init.d/kdump status

Reboot the System:reboot

9.2 Symulate a Panic and Save a Crash Dump

There are different ways to simulate a panic. The following are the most common:

echo 1 > /proc/sys/kernel/sysrqecho c > /proc/sysrq-trigger

OR:echo 1 > /proc/sys/kernel/sysrq

On the system console type:Alt-SysRq-u

All filesystems will be re-mounted as read-only: this saves the system from running fsck on all the file systems when the system reboots.

On the system console type:Alt-SysRq-c

Page 77: Debugging Core Files Crash Dumps UNIX Linux

This will force the system to panic and a crash dump to be taken.

9.3 Analyze Crash Dump by using crash

On CentOS 5 and 6, Download and Install the kernel-debuginfo and kernel-debuginfo-common Packages:wget http://debuginfo.centos.org/5/`uname -i`/kernel-debuginfo-`uname -r`.i686.rpmwget http://debuginfo.centos.org/5/`uname -i`/kernel-debuginfo-common-`uname -r`.i686.rpm

OR:wget http://debuginfo.centos.org/5/`uname -i`/kernel-debuginfo-`uname -r`.i686.rpmwget http://debuginfo.centos.org/5/`uname -i`/kernel-debuginfo-common-`uname -r`.i686.rpm

OR for Red Hat 5:wget ftp.redhat.com/pub/redhat/linux/enterprise/5Server/en/os/x86_64/Debuginfo/kexec-tools-debuginfo-1.102pre-96.el5_5.4.x86_64.rpm

OR for SuSE:wget ftp5.gwdg.de/pub/opensuse/repositories/Kernel:/kdump/openSUSE_11.1/x86_64/kexec-tools-debuginfo-2.0.0-58.1.x86_64.rpm

rpm -Uvh kernel-debuginfo*

Check the crash dump files:cd /var/crash/2009-06-09-20\:18ls –lrtfile vmcorestrings vmcore

Start crash and Analyze Output:crash /usr/lib/debug/lib/modules/crashed-kernel-version/vmlinux /var/crash/2009-06-09-20\:18/vmcore

OR:crash /usr/lib/debug/lib/modules/`uname -r`/vmlinux /var/crash/2009-06-09-20\:18/vmcore | tee /var/crash/crash3.log

At crash PromptView System Data:crash> sys

Get Info About Open Files:crash> files

Display Processes Status:crash> ps

Display Virtual Memory Info:crash> vm

Page 78: Debugging Core Files Crash Dumps UNIX Linux

View Stack Traces:crash> bt -a

Display Modules Info and Loading of Symbols and Debugging Data:crash> mod

Dump Kernel Log Buffer Contents in Chronological Order:crash> log

Analyze EIP Address (from the Preceeding Output):crash> dis -lr c04a9c34

Exit:crash> exit

Running crash in Unattended Mode

You can run crash in unattended (non-interactive) mode by creating an input file containing the commands you want to pass to crash.

Generate an Input File Containing Commands:vi inputfilebtlogpsexit

Run Crash:crash -i inputfile

OR:crash < inputfile

OR:crash <debuginfo> vmcore < inputfile > outputfile

OR:crash <System map> <vmlinux> vmcore < inputfile > outputfile

9.4 Analyze Crash Dump by using GDB

Check the crash dump files:cd /var/crash/2009-06-09-20\:18ls –lrtfile vmcorestrings vmcore

Start gdb on core file:gdb -c core

OR:

Page 79: Debugging Core Files Crash Dumps UNIX Linux

gdb a.out core

OR:gdb path/to/the/binary path/to/the/coreobjdump -d -S null-pointer.ko > /tmp/whatever

OR by gdb Prompt:(gdb) core core

At gdb Prompt, To Analyze BackTrace:bt

Check Status:(gdb) status

View Data:(gdb) data

View Stacks:(gdb) stack

Analyze a Stack by its Number:(gdb) frame number

View Code around that Stack:(gdb) list

List Variables:(gdb) info locals

View Files:(gdb) files

View Internals:(gdb) internals

View Command Aliases:(gdb) aliases

Check Support Facilities:(gdb) support

Running Program:(gdb) running

Quit the debugger:(gdb) quit

9.5 Analyze Crash Dump by using LKCD

Page 80: Debugging Core Files Crash Dumps UNIX Linux

The Linux Kernel Crash Dump (LKCD) is a project that provides a a reliable method of detecting, saving and examining system crashes.

Download the current lkcdutils rpm and the patches from here and upload the packages onto the server.The installation of LKCD requires the kernel patches installation, a new kernel to be built and the LKCD utilities to be installed.

Make a copy of the kernel source directory:cp -r /usr/src/linux-x.x.x /usr/src/linux-x.x.x.lkcd

Access the newly-created directory:cd /usr/src/linux-x.x.x.lkcd

Test the patches:patch -p1 --dry-run < <path>/lkcd-x.x.x.diff

If the previous command did not report any errors, then apply the kernel patches:patch -p1 < <path>/lkcd-x.x.x.diff

Configure the kernel adding LKCD support (compiled into kernel, not as a module) and enabling Magic SysRq Keys (Magic SysRq Keys is not a mandatory but it will allow a crash to be created when a system has hung):make menuconfig

Navigate to “Kernel Hacking” and type <enter>.Navigate to “Magic SysRq key” and type <space> an asterisk should appear next to the line Magic SysRq key.Navigate to “Linux Kernel Crash Dump (LKCD)” and type <space> until an asterisk appears. If compression options are presented select all available.Press<tab> <enter> twice until you are prompted to save configuration: type <enter> to save and exit menuconfig.

Build the new kernel:make dep; make bzImage

Install the kernel image:make install

The kernel build process will have built the file Kerntypes in the kernel source directory: check wether this file was copied to the /boot directory and if needed copy this files yourself:cp Kerntypes /boot

The kernel build process builds the file System.map in the kernel build directory, and the kernel install process copies this file into the /boot directory: check that /boot/System.map matches the copy in the kernel source directory:diff System.map /boot/System.map

If the two files do not match, then make a fresh copy in the /boot directory:cp System.map /boot

Reboot with new kernel:init 6

Page 81: Debugging Core Files Crash Dumps UNIX Linux

As the system is up and running, check out the /proc/sys/dump directory exists:ls -d /proc/sys/dump

If the directory is missing the kernel has not been patched or configured properly for LKCD.

Once the kernel is patched, install the LKCD Utilities rpm:rpm -i lkcdutils-x_x-x_xxxx.rpm

Edit the system startup script /etc/rc.sysinit on Red Hat and CentOS or /sbin/init.d/boot on SuSE (to find the system startup script for yout distribution issue the command “grep sysinit /etc/inittab”). Locate the lineaction $"Mounting local filesystems: "mount -a -tnonfs,smbfs,ncpfs

Following this line add this text:/sbin/lkcd config

If you are using a swap partition as the dump device, then the dump must be save before swap is activated. Locate the line with the “swapon” command in the system startup script and change it link this (adding the lkcd commands above it):/sbin/lkcd config/sbin/lkcd save# Start up swapping.action $"Activating swap partitions: " swapon -a –e

Configure the device on which to save the crash dump by creating a symbolic link to the chosen device and updating the LKCD configuration:df –kcat /proc/partitionsln -s /dev/sdb1 /dev/vmdump/sbin/lkcd config

Enable the Magic SysRq key with the following command:echo 1 > /proc/sys/kernel/sysrq

Check or Edit Configuration File According to your Needs:vi /etc/sysconfig/dump

The parameter DUMP_ACTIVE must be set to 1 to enable the dump process.Set DUMP_SAVE to 1 if you want to save the memory image to disk.Define the DUMP_LEVEL: 0 nothing, 1 dump the dump header and first 128K bytes out, 4 dump everything except the kernel free pages, 8 dump all memory.Set DUMP_COMPRESS to 0 if you do not want the dump to be compressed, to 1 if you want to use rle compression or to 2 for gzip compression.An example of dump configuration file:DUMP_ACTIVE=1DUMPDEV=/dev/vmdumpDUMPDIR=/var/log/dumpDUMP_SAVE=1DUMP_LEVEL=8DUMP_FLAGS=0DUMP_COMPRESS=0

Page 82: Debugging Core Files Crash Dumps UNIX Linux

PANIC_TIMEOUT=5

After changing the configuration, update and enable Crash Dump Saving:lkcd config

Check Configuration Settings:lkcd query

Setup the Service to Start at Boot:chkconfig boot.lkcd on

Test the LKCD.On the system console type:Alt-SysRq-u

All filesystems will be re-mounted as read-only: this saves the system from running fsck on all the file systems when the system reboots.

On the system console type:Alt-SysRq-c

This will force the system to panic and a crash dump to be taken.If the system startup scripts don't contain the “lkcd save” command, then create the dump files:/sbin/lkcd save

As the system is back up and running, check the dump files have been created:cd /var/log/dump/0ls -lrt

Invoke LKCD lcrash:/sbin/lcrash map.0 dump.0 kerntypes.0OR:/sbin/lcrash –n 0

At the lcrash Prompt,Get a list of processes running at the time of the crash: >>ps

Display system statistics and the log_buf array:>>stat

Display the crash dump report:>>report>>report –w outfile

Display dump:>>dump>>dump c02e4820 8 −o>>dump c02e4820 8 −d>>dump c02e4820 8 −x

List opened namelists:>>namelist>>namelist −a /tmp/snd.o

Page 83: Debugging Core Files Crash Dumps UNIX Linux

Display modules informations:>>module>>module pcmcia_core>>module pcmcia_core –f>>module kernel_module −f −i 10

Display page structures informations:>>page

Evaluate and print expressions:>>print

Dynamically load a library of lcrash commands:>>ldcmds

Displays all complete and unique stack traces:>>strace

Display stack trace for task_struct:>>trace

Display information for task_struct structs:>>task

List symbol table informations:>>symtab –l

List symbols in the specified module:>>symtab −l −f /tmp/my_dummy.map

Removing symbol table:>>symtab −r /tmp/my_dummy.map

Recreating and reloading symbol table:>>symtab −a __ksymtab__>>symtab −a /tmp/my_dummy.map my_dummy>>symtab –l

Walk a linked list of kernel structures or memory blocks:>>walk

Examine a local variable:>>whatis DUMMY>>print *(dummy_t*) d0000240>> whatis dummy_s.member2

Display disassembled code:>>dis −F memcmp>>dis 0xc025188e 10 −f

Quit lcrash:>>q

Page 84: Debugging Core Files Crash Dumps UNIX Linux

9.6 Other Useful Commands

Examine a running kernel after a crash can be very useful to check wether it’s experiencing issues:cat /proc/sys/kernel/tainted

If a module, a library or a program is suspected to having caused a panic, you can dump/disassemble it:objdump -D -S <compiled_object_with_debug_symbols> > filename.out

10. Generate / Analyze a Crash Dump on Linux

10.1 Setup and Enable KDB

KDB is an interactive kernel debugger shipped with IBM AIX operating system.

kdb allows the user to control execution of kernel code (including kernel extensions and device drivers), and to observe and modify the variables and register. It has to be invoked by a special boot image.

The kdb is a tool/command for analysing the system dumps. It is used for post-mortem analysis of system dumps, or for monitoring the running kernel.

Check Current Dump Device(s):sysdumpdev -l

Start the System Dump:sysdumpstart -p

Check the Minimum Size for the Dump Device:sysdumpdev -e

Enable the KDB, but Not Invoke it at Boot:bosboot -a -d /dev/ipldevice -D

Enable the KDB and Invoke it at Boot:bosboot -a -d /dev/ipldevice -D

Disable the KDB:bosboot -a -d /dev/ipldevice

Check if KDB is Available:kdb(0)>dw kdb_avail(0)>dw kdb_wanted

Find the Dump Object:lsnim -l worker

Page 85: Debugging Core Files Crash Dumps UNIX Linux

10.2 Analyze a Crash Dump by using KDB

Check Current Dump Device(s):sysdumpdev -l

Check if KDB is Available:kdb>dw kdb_avail

Find the Dump Object:lsnim -l worker

Access the crash dump directory:cd /var/crash/ls –lrt

View the content of the snap package:zcat snap.pax.Z | pax –v

Exctract the content of the snap package:zcat snap.pax.Z | pax -r

OR extract just the dump, general, and kernel subdirectories:uncompress snap.pax.Zzcat snap.pax.Z | pax -r ./dump ./general ./kernel

Check the Timestamps of Dump and UNIX Files:what unix | grep _kdb_buildinfowhat dump | grep _kdb_buildinfowhat /usr/sbin/kdb_64 | grep _kdb_buildinfowhat /usr/sbin/kdb_mp | grep _kdb_buildinfo

Analyze a Core:kdb /var/adm/ras/vmcore.0 /unix

At kdb Prompt, Display system statistics that include the last kernel printf() messages still in memory:>stat

Display all of the stack frames from the current instruction as deep as possible (interrupts, system calls, user stack):>f

Display informations about what’s currently running on each processor:>status

Display the symptom string for a dump:>symptom

Show system log entries not processed by the log daemon:>errpt

Page 86: Debugging Core Files Crash Dumps UNIX Linux

Shows the global error-logging control informations:>errlg -g

Shows the error-logging control informations about the specified address:>errlg –a address

Show dump-time.trace informations:>dmptrc

Displays information about the Lightweight Memory Trace (LMT):>mtrc all -v

Displays information about the Lightweight Memory Trace (LMT) for CPU 0:>mtrc –C 0 –v

Dump the event buffer on channel 2, related to Thread ID 14539 for an active system trace:>trace –c 2 –t 14539

Initial CPU Context:>cpu 1

Get Breakpoints:>brk

Display the Stack om Raw Format:>dw @r1 90

Display All of the Function Addresses:>devsw

Display data at ustname:>dw utsname

Find physical address at ustname:>tr utsname

Get Machine State:>mst

Get Machine State Register:>mrs

Dump the Content of the Machinte State Register:>dr msr

VMM Error Log:>vmlog

Display informations about component dump tables in a system memory dump:>cdt>cdt 11>cdt -p 11 7

Process Info:

Page 87: Debugging Core Files Crash Dumps UNIX Linux

>proc *

Display file table:>file

Print “intr” Symbol:>pr -p intr

Show symbols matching “*r”:>pr -p *r

Print following the next pointer:>pr -l next intr 30047A80

Display iNode table:>ino

Print details of the inode pool:>jfsnode

Print gfs slot 1:>gfs gfs

Display either the Enhanced Journaled File System (JFS2) d-tree or x-tree structure based on the specified inode parameter:>tree 325C1080

Print gfs slot 2:>gfs gfs+30

Print gfs slot 3:>gfs gfs+60

pid Output:>p 3

Get Threads:> thread *

Print current thread:>tpid

Show VMM free lsit informations:>freelist

Tid Output:>th 12

kdb Output:>p *

Get the Address of the Symbol and Table Of Contents Section of the Executable Module:>nm>nm vmerrlog

Page 88: Debugging Core Files Crash Dumps UNIX Linux

Display the inpcb structure for TCP connections:>tcb –s

Display the inpcb structure for UDP connections:>udb –s

Print the socket structure for TCP and UDP sockets:>sock –s

Display data structure (mbuf) informations (mbufs are used to store data in the kernel for incoming and outbound network traffic):>mbuf –p

Display data structure (mbuf) informations and follow the packet chain:>mbuf -a

Follow the mbuf structure within a packet:>mbuf –n effectiveaddress

Display the list of all of the valid network device driver tables and gives the address of each ndd structure and the name of the corresponding network interface:>ndd –s

Display network connections at the time of the crash:>netstat -an

Display network interface informations:>ifnet

Display the list of kernel data structures checkers:>check

Display informations about the specified kernel data structures checker:>check –h proc

Run the proc checker to validate the entire process table:>check –l 7 proc

Exit the KDB:>g

11. Debugging Tools

11.1 Informations

In this section you can find a collection of debugging tools for the main UNIX and Linux operating system.

GDB:

Page 89: Debugging Core Files Crash Dumps UNIX Linux

The GNU Project Debugger allows you to see what is going on `inside' another program while it executes -- or what another program was doing at the moment it crashed. It is available for different UNIX operating systems and Linux distributions.

HP tusc:Tusc traces system calls invoked by a process. It works with HP-UX 11.0 and 11i PA-RISC systems, and HP-UX 11i HP Integrity systems. It is not supported on HP-UX 10.20. tusc is similar in functionality to truss on Solaris.

HP Wildebeest Debugger (WDB):HP WDB is an HP-supported implementation of the Open Source GNU debugger (GDB). It is available for different HP-UX versions and architectures.

Linux Kernel Crash Dump:LKCD is a project is designed to meet the needs of customers and system administrators wanting a reliable method of detecting, saving and examining system crashes. It is available for different Linux distributions.

DTrace Toolkit:DTrace Toolkit is a collection of DTrace scripts to debug and deep diving a system:you can download here the current version (0.99).

DTrace TazTool:DTrace TazTool is the DTrace version of the program “taztool”, a disk trace tool developed by Richard McDougall which takes the TNF disk trace records and matches them up in pairs for the start and end of a disk transaction.DTrace TazTool could be though as a taztool evolution: the last version is the 0.51 and it can be downloaded here.If you want, you can also download taztool 1.1 (as you’ll notice its package name is RMCtaz).

Dexplorer:DExplorer automatically runs a collection of DTrace scripts to examine many areas of the system, and places the output in a meaningful directory structure that is tar'd and gzip'd.You can download the current version, 0.70.

Lsof for HP:Lsof lists files, sockets, inodes, etc… opened by processes.

Lsof for Solaris:You can find lsof for Solaris 10 SPARC and x86 on http://www.sunfreeware.com: you have to create a free account to download packages from this site.You can find packages for the previous versions of Solaris on http://unixpackages.com/: packages on this site are not freeware as you need to buy a subscription (a single-user subscription currently costs $20/Year.

SE Toolkit:The SE Toolkit is a collection of scripts for performance analysis and gives advice on performance improvement. It has been a standard in system performance monitoring for the Solaris platform over the last 10 years.

XE Toolkit:The XE Toolkit is a multi-platform, network-aware, secure performance monitoring solution for tactical analysis of enterprise computing systems.

Page 90: Debugging Core Files Crash Dumps UNIX Linux

NMON:This Solaris system monitoring tool allows to perform standard SAR activity reporting and NMON activity reporting. The NMON output can be imported with Excel or RRD to output simple and efficient graphs.

Ksar:ksar is a sar graphing tool that can graph for now linux,mac and solaris sar output. sar statistics graph can be output to a pdf file.

Sar2html:sar2html converts sar binary data to graphical html format. It has command line, web interface and data collection script. HPUX 11.11, 11.23, 11,31, Redhat 3, 4, 5, 6 Suse 8, 9, 10, 11 and Solaris 5.9, 5.10 are supported.

Sarface:sarface is a user-interface to the sysstat/sar database which inputs data from sar and plots to a live X11 graph via gnuplot. It mimics the cmd-line options from sar but can cross-plot any two or more stats and apply simple mathematical functions them.

Visual SAR:Visual SAR is a Java graphical interpreter of an Unix sar command. It reads a sar output from a file and show it in a graphical format. Visual SAR allow a quick interpretation of a server behavior in several days.

Sarvant:Sarvant (SAR Visual ANalysis Tool) is a python script that will analyze a sar file (from the sysstat utility, 'sar') and produce graphs using gnuplot of the collected data.

Sarparse:Sarparse is a utility based off of cacti to graph Sar metrics from remote hosts. It require NRPE and SAR to run out-of-the box but could easily be modified for any other transport.