Troubleshooting the Service Management Facility

Debugging a Service That Is Not Starting

In this procedure, the print service is disabled.

  1. Become superuser or assume a role that includes the Service Management rights profile.

    Roles contain authorizations and privileged commands. For more information about roles, see Configuring RBAC in System Administration Guide: Security Services .

  2. Request information about the hung service.

    # svcs -xv
    svc:/application/print/server:default (LP Print Service)
     State: disabled since Wed 13 Oct 2004 02:20:37 PM PDT
    Reason: Disabled by an administrator.
       See: http://sun.com/msg/SMF-8000-05
       See: man -M /usr/share/man -s 1M lpsched
    Impact: 2 services are not running:
            svc:/application/print/rfc1179:default
            svc:/application/print/ipp-listener:default

    The x option provides additional information about the service instances that are impacted.

  3. Enable the service.

    # svcadm enable application/print/server
    

How to Repair a Corrupt Repository

This procedure shows how to replace a corrupt repository with a default copy of the repository. When the repository daemon, svc.configd, is started, it does an integrity check of the configuration repository. This repository is stored in /etc/svc/repository.db. The repository can become corrupted due to one of the following reasons:

  • Disk failure

  • Hardware bug

  • Software bug

  • Accidental overwrite of the file

If the integrity check fails, the svc.configd daemon writes a message to the console similar to the following:

svc.configd: smf(5) database integrity check of:

    /etc/svc/repository.db

  failed.  The database might be damaged or a media error might have
  prevented it from being verified.  Additional information useful to
  your service provider is in:

    /etc/svc/volatile/db_errors

  The system will not be able to boot until you have restored a working
  database.  svc.startd(1M) will provide a sulogin(1M) prompt for recovery
  purposes.  The command:

    /lib/svc/bin/restore_repository

  can be run to restore a backup version of your repository. See
  http://sun.com/msg/SMF-8000-MY for more information.

The svc.startd daemon then exits and starts sulogin to enable you to perform maintenance.

  1. Enter the root password at the sulogin prompt. sulogin enables the root user to enter system maintenance mode to repair the system.

  2. Run the following command:

    # /lib/svc/bin/restore_repository
    

    Running this command takes you through the necessary steps to restore a non-corrupt backup. SMF automatically takes backups of the repository at key system moments. For more information see SMF Repository Backups.

    When started, the /lib/svc/bin/restore_repository command displays a message similar to the following:

    Repository Restore utility
    See http://sun.com/msg/SMF-8000-MY for more information on the use of
    this script to restore backup copies of the smf(5) repository.
    
    If there are any problems which need human intervention, this script
    will give instructions and then exit back to your shell.
    
    Note that upon full completion of this script, the system will be
    rebooted using reboot(1M), which will interrupt any active services.

    If the system that you are recovering is not a local zone, the script explains how to remount the / and /usr file systems with read and write permissions to recover the databases. The script exits after printing these instructions. Follow the instructions, paying special attention to any errors that might occur.

    After the root (/) file system is mounted with write permissions, or if the system is a local zone, you are prompted to select the repository backup to restore:

    The following backups of /etc/svc/repository.db exists, from
    oldest to newest:
    
    ... list of backups ...

    Backups are given names, based on type and the time the backup was taken. Backups beginning with boot are completed before the first change is made to the repository after system boot. Backups beginning with manifest_import are completed after svc:/system/manifest-import:default finishes its process. The time of the backup is given in YYYYMMDD_HHMMSS format.

  3. Enter the appropriate response.

    Typically, the most recent backup option is selected.

    Please enter one of:
            1) boot, for the most recent post-boot backup
            2) manifest_import, for the most recent manifest_import backup.
            3) a specific backup repository from the above list
            4) -seed-, the initial starting repository. (All customizations
               will be lost.)
            5) -quit-, to cancel.
    
    Enter response [boot]:

    If you press Enter without specifying a backup to restore, the default response, enclosed in [] is selected. Selecting -quit- exits the restore_repository script, returning you to your shell prompt.

    Note

    Selecting -seed- restores the seed repository. This repository is designed for use during initial installation and upgrades. Using the seed repository for recovery purposes should be a last resort.

    After the backup to restore has been selected, it is validated and its integrity is checked. If there are any problems, the restore_repository command prints error messages and prompts you for another selection. Once a valid backup is selected, the following information is printed, and you are prompted for final confirmation.

    After confirmation, the following steps will be taken:
    
    svc.startd(1M) and svc.configd(1M) will be quiesced, if running.
    /etc/svc/repository.db
        -- renamed --> /etc/svc/repository.db_old_YYYYMMDD_HHMMSS
    /etc/svc/volatile/db_errors
        -- copied --> /etc/svc/repository.db_old_YYYYMMDD_HHMMSS_errors
    repository_to_restore
        -- copied --> /etc/svc/repository.db
    and the system will be rebooted with reboot(1M).
    
    Proceed [yes/no]?
  4. Type yes to remedy the fault.

    The system reboots after the restore_repository command executes all of the listed actions.

How to Boot Without Starting Any Services

If problems with starting services occur, sometimes a system will hang during the boot. This procedure shows how to troubleshoot this problem.

  1. Boot without starting any services.

    This command instructs the svc.startd daemon to temporarily disable all services and start sulogin on the console.

    ok boot -m milestone=none
    
  2. Log in to the system as root.

  3. Enable all services.

    # svcadm milestone all
    
  4. Determine where the boot process is hanging.

    When the boot process hangs, determine which services are not running by running svcs a. Look for error messages in the log files in /var/svc/log.

  5. After fixing the problems, verify that all services have started.

    1. Verify that all needed services are online.

      # svcs -x
      
    2. Verify that the console-login service dependencies are satisfied.

      This command verifies that the login process on the console will run.

      # svcs -l system/console-login:default
      
  6. Continue the normal booting process.

How to Force a sulogin Prompt If the system/filesystem/local:default Service Fails During Boot

Local file systems that are not required to boot the Solaris OS are mounted by the svc:/system/filesystem/local:default service. When any of those file systems are unable to be mounted, the service enters a maintenance state. System startup continues, and any services which do not depend on filesystem/local are started. Services which require filesystem/local to be online before starting through dependencies are not started.

To change the configuration of the system so that a sulogin prompt appears immediately after the service fails instead of allowing system startup to continue, follow the procedure below.

  1. Modify the system/console-login service.

    # svccfg -s svc:/system/console-login
    svc:/system/console-login> addpg site,filesystem-local dependency
    svc:/system/console-login> setprop site,filesystem-local/entities = fmri: svc:/system/filesystem/local
    svc:/system/console-login> setprop site,filesystem-local/grouping = astring: require_all
    svc:/system/console-login> setprop site,filesystem-local/restart_on = astring: none
    svc:/system/console-login> setprop site,filesystem-local/type = astring: service
    svc:/system/console-login> end
    
  2. Refresh the service.

    # svcadm refresh console-login
    

Example 15.18. Forcing an sulogin Prompt Using Jumpstart

Save the following commands into a script and save it as /etc/rcS.d/S01site-customfs.

#!/bin/sh
#
# This script adds a dependency from console-login -> filesystem/local
# This forces the system to stop the boot process and drop to an sulogin prompt
# if any file system in filesystem/local fails to mount.

PATH=/usr/sbin:/usr/bin
export PATH

	svccfg -s svc:/system/console-login << EOF
addpg site,filesystem-local dependency
setprop site,filesystem-local/entities = fmri: svc:/system/filesystem/local
setprop site,filesystem-local/grouping = astring: require_all
setprop site,filesystem-local/restart_on = astring: none
setprop site,filesystem-local/type = astring: service
EOF

svcadm refresh svc:/system/console-login

[ -f /etc/rcS.d/S01site-customfs ] &&
    rm -f /etc/rcS.d/S01site-customfs

When a failure occurs with the system/filesystem/local:default service, the svcs vx command should be used to identify the failure. After the failure has been fixed, the following command clears the error state and allows the system boot to continue: svcadm clear filesystem/local.