If a system running the Solaris Operating System crashes, provide your service provider with as much information as possible, including crash dump files.
Solaris 10: kmdb has replaced kadb as the standard “in situ” Solaris kernel debugger.
kmdb brings all the power and flexibility of mdb to live kernel debugging. kmdb supports the following:
Debugger commands (dcmds)
Debugger modules (dmods)
Access to kernel type data
Kernel execution control
Inspection
Modification
For more information, see the kmdb ( 1 ) man page. For step-by-step instructions on using kmdb to troubleshoot a system, see the How to Boot the System With the Kernel Debugger (kmdb) in System Administration Guide: Basic Administration .
Solaris 10: The Solaris DTrace facility is a comprehensive dynamic tracking facility that gives you a new level of observerability into the Solaris kernel and user processes. DTrace helps you understand your system by permitting you to dynamically instrument the OS kernel and user processes to record data that you specify at locations of interest, called, probes. Each probe can be associated with custom programs that are written in the new D programming language. All of DTrace's instrumentation is entirely dynamic and available for use on your production system. For more information, see the dtrace ( 1M ) man page and the Solaris Dynamic Tracing Guide .
The most important things to remember are:
Write down the system console messages.
If a system crashes, making it run again might seem like your
most pressing concern. However, before you reboot the system, examine the
console screen for messages. These messages can provide some insight about
what caused the crash. Even if the system reboots automatically and the console
messages have disappeared from the screen, you might be able to check these
messages by viewing the system error log, the/var/adm/messages
file.
For more information about viewing system error log files, see How to View System Messages.
If you have frequent crashes and can't determine their cause,
gather all the information you can from the system console or the /var/adm/messages
files, and have it ready for a customer service representative
to examine. For a complete list of troubleshooting information to gather for
your service provider, see Troubleshooting a System Crash.
If the system fails to reboot successfully after a system crash, see Chapter 25, Troubleshooting Miscellaneous Software Problems (Tasks).
Synchronize the disks and reboot.
ok sync
If the system fails to reboot successfully after a system crash, see Chapter 25, Troubleshooting Miscellaneous Software Problems (Tasks).
Check to see if a system crash dump was generated after the system crash. System crash dumps are saved by default. For information about crash dumps, see Chapter 24, Managing System Crash Information (Tasks).
Answer the following questions to help isolate the system problem. Use Troubleshooting a System Crash Checklist for gathering troubleshooting data for a crashed system.
Table 21.1. Identifying System Crash Data
Question |
Description |
---|---|
Can you reproduce the problem? |
This is important because a reproducible test case is often essential for debugging really hard problems. By reproducing the problem, the service provider can build kernels with special instrumentation to trigger, diagnose, and fix the bug. |
Are you using any third-party drivers? |
Drivers run in the same address space as the kernel, with all the same privileges, so they can cause system crashes if they have bugs. |
What was the system doing just before it crashed? |
If the system was doing anything unusual like running a new stress test or experiencing higher-than-usual load, that might have led to the crash. |
Were there any unusual console messages right before the crash? |
Sometimes the system will show signs of distress before it actually crashes; this information is often useful. |
Did you add any tuning parameters to the |
Sometimes tuning parameters, such as increasing shared memory segments so that the system tries to allocate more than it has, can cause the system to crash. |
Did the problem start recently? |
If so, did the onset of problems coincide with any changes to the system, for example, new drivers, new software, different workload, CPU upgrade, or a memory upgrade. |