Because Solaris Volume Manager enables you to mirror the root (/
), swap
, and /usr
directories, special problems
can arise when you boot the system. These problems can arise either through
hardware failures or operator error. The procedures in this section provide
solutions to such potential problems.
The following table describes these problems and points you to the appropriate solution.
Table 25.1. Common Boot Problems With Solaris Volume Manager
Reason for the Boot Problem |
For Instructions |
---|---|
The |
How to Recover From Improper |
Not enough state database replicas have been defined. |
How to Recover From Insufficient State Database Replicas |
A boot device (disk) has failed. |
How to Recover From a Boot Device Failure |
If Solaris Volume Manager takes a volume offline due to errors, unmount all file systems on the disk where the failure occurred.
Because each disk slice is independent, multiple file systems can be mounted on a single disk. If the software has encountered a failure, other slices on the same disk will likely experience failures soon. File systems that are mounted directly on disk slices do not have the protection of Solaris Volume Manager error handling. Leaving such file systems mounted can leave you vulnerable to crashing the system and losing data.
Minimize the amount of time you run with submirrors that are disabled or offline. During resynchronization and online backup intervals, the full protection of mirroring is gone.
If you have made an incorrect entry in the /etc/vfstab
file,
for example, when mirroring the root (/
) file system,
the system appears at first to be booting properly. Then, the system fails.
To remedy this situation, you need to edit the /etc/vfstab
file
while in single-user mode.
The high-level steps to recover from improper /etc/vfstab
file
entries are as follows:
Booting the system to single-user mode
Running the fsck command on the mirror volume
Remounting file system read-write options enabled
Optional: running the metaroot command
for a root (/
) mirror
Verifying that the /etc/vfstab
file correctly
references the volume for the file system entry
Rebooting the system
Recovering the root (/
)
RAID-1 (Mirror) Volume
In the following example, the root (/
) file system
is mirrored with a two-way mirror, d0
. The root (/
) entry in the /etc/vfstab
file has somehow
reverted back to the original slice of the file system. However, the information
in the /etc/system
file still shows booting to be from
the mirror d0
. The most likely reason is that the metaroot command was not used to maintain the /etc/system
and /etc/vfstab
files. Another possible reason is that an old copy
of the/etc/vfstab
file was copied back into the current /etc/vfstab
file.
The incorrect /etc/vfstab
file looks similar to
the following:
#device device mount FS fsck mount mount #to mount to fsck point type pass at boot options # /dev/dsk/c0t3d0s0 /dev/rdsk/c0t3d0s0 / ufs 1 no - /dev/dsk/c0t3d0s1 - - swap - no - /dev/dsk/c0t3d0s6 /dev/rdsk/c0t3d0s6 /usr ufs 2 no - # /proc - /proc proc - no - swap - /tmp tmpfs - yes -
Because of the errors, you automatically go into single-user mode when the system is booted:
okboot
... configuring network interfaces: hme0. Hostname: host1 mount: /dev/dsk/c0t3d0s0 is not this fstype. setmnt: Cannot open /etc/mnttab for writing INIT: Cannot create /var/adm/utmp or /var/adm/utmpx INIT: failed write of utmpx entry:" " INIT: failed write of utmpx entry:" " INIT: SINGLE USER MODE Type Ctrl-d to proceed with normal startup, (or give root password for system maintenance): <root-password
>
At this point, the root (/
) and /usr
file
systems are mounted read-only. Follow these steps:
Run the fsck command
on the root (/
) mirror.
Be careful to use the correct volume for the root (/
)
mirror.
# fsck /dev/md/rdsk/d0
** /dev/md/rdsk/d0
** Currently Mounted on /
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
2274 files, 11815 used, 10302 free (158 frags, 1268 blocks,
0.7% fragmentation)
Remount the root (/
)
file system as read/write file system so that you can edit the /etc/vfstab
file.
# mount -o rw,remount /dev/md/dsk/d0 /
mount: warning: cannot lock temp file </etc/.mnt.lock>
Run the metaroot command.
# metaroot d0
This command edits the /etc/system
and /etc/vfstab
files to specify that the root (/
) file system
is now on volume d0
.
Verify that the /etc/vfstab
file
contains the correct volume entries.
The root (/
)
entry in the /etc/vfstab
file should appear as follows
so that the entry for the file system correctly references the RAID-1 volume:
#device device mount FS fsck mount mount #to mount to fsck point type pass at boot options # /dev/md/dsk/d0 /dev/md/rdsk/d0 / ufs 1 no - /dev/dsk/c0t3d0s1 - - swap - no - /dev/dsk/c0t3d0s6 /dev/rdsk/c0t3d0s6 /usr ufs 2 no - # /proc - /proc proc - no - swap - /tmp tmpfs - yes -
Reboot the system.
The system returns to normal operation.
How to Recover From a Boot Device Failure
If you have a root (/
) mirror and your boot device
fails, you need to set up an alternate boot device.
The high-level steps in this task are as follows:
Booting from the alternate root (/
) submirror
Determining the erred state database replicas and volumes
Repairing the failed disk
Restoring state database replicas and volumes to their original state
Initially, when the boot device fails, you'll see a message similar to the following. This message might differ among various architectures.
Rebooting with command: Boot device: /iommu/sbus/dma@f,81000/esp@f,80000/sd@3,0 The selected SCSI device is not responding Can't open boot device ...
When you see this message, note the device. Then, follow these steps:
Boot from another root (/
)
submirror.
Since only two of the six state database replicas in this example are in error, you can still boot. If this were not the case, you would need to delete the inaccessible state database replicas in single-user mode. This procedure is described in How to Recover From Insufficient State Database Replicas.
When
you created the mirror for the root (/
) file system,
you should have recorded the alternate boot device as part of that procedure.
In this example, disk2
is that alternate boot device.
okboot disk2
SunOS Release 5.9 Version s81_51 64-bit Copyright 1983-2001 Sun Microsystems, Inc. All rights reserved. Hostname: demo ... demo console login:root
Password: <root-password
> Dec 16 12:22:09 host1 login: ROOT LOGIN /dev/console Last login: Wed Dec 12 10:55:16 on console Sun Microsystems Inc. SunOS 5.9 s81_51 May 2002 ...
Determine how many state database replicas have failed by using the metadb command.
# metadb
flags first blk block count
M p unknown unknown /dev/dsk/c0t3d0s3
M p unknown unknown /dev/dsk/c0t3d0s3
a m p luo 16 1034 /dev/dsk/c0t2d0s3
a p luo 1050 1034 /dev/dsk/c0t2d0s3
a p luo 16 1034 /dev/dsk/c0t1d0s3
a p luo 1050 1034 /dev/dsk/c0t1d0s3
In this example, the system can no longer detect state database replicas
on slice /dev/dsk/c0t3d0s3
, which is part of the failed
disk.
Determine that half of the root
(/
), swap
, and /usr
mirrors
have failed by using the metastat command.
# metastat
d0: Mirror
Submirror 0: d10
State: Needs maintenance
Submirror 1: d20
State: Okay
...
d10: Submirror of d0
State: Needs maintenance
Invoke: "metareplace d0 /dev/dsk/c0t3d0s0 <new device>"
Size: 47628 blocks
Stripe 0:
Device Start Block Dbase State Hot Spare
/dev/dsk/c0t3d0s0 0 No Maintenance
d20: Submirror of d0
State: Okay
Size: 47628 blocks
Stripe 0:
Device Start Block Dbase State Hot Spare
/dev/dsk/c0t2d0s0 0 No Okay
d1: Mirror
Submirror 0: d11
State: Needs maintenance
Submirror 1: d21
State: Okay
...
d11: Submirror of d1
State: Needs maintenance
Invoke: "metareplace d1 /dev/dsk/c0t3d0s1 <new device>"
Size: 69660 blocks
Stripe 0:
Device Start Block Dbase State Hot Spare
/dev/dsk/c0t3d0s1 0 No Maintenance
d21: Submirror of d1
State: Okay
Size: 69660 blocks
Stripe 0:
Device Start Block Dbase State Hot Spare
/dev/dsk/c0t2d0s1 0 No Okay
d2: Mirror
Submirror 0: d12
State: Needs maintenance
Submirror 1: d22
State: Okay
...
d12: Submirror of d2
State: Needs maintenance
Invoke: "metareplace d2 /dev/dsk/c0t3d0s6 <new device>"
Size: 286740 blocks
Stripe 0:
Device Start Block Dbase State Hot Spare
/dev/dsk/c0t3d0s6 0 No Maintenance
d22: Submirror of d2
State: Okay
Size: 286740 blocks
Stripe 0:
Device Start Block Dbase State Hot Spare
/dev/dsk/c0t2d0s6 0 No Okay
In this example, the metastat command shows that the following submirrors need maintenance:
Submirror d10
, device c0t3d0s0
Submirror d11
, device c0t3d0s1
Submirror d12
, device c0t3d0s6
Halt the system, replace the disk. Use the format command or the fmthard command, to partition the disk as it was before the failure.
If the new disk is identical to the existing disk (the intact side
of the mirror, in this example), quickly format the new disk. To do so, use
the prtvtoc /dev/rdsk/c0t2d0s2 | fmthard -s - /dev/rdsk/c0t3d0s2 command
(c0t3d0
, in this example).
#halt
... Halted ... okboot
... #format /dev/rdsk/c0t3d0s0
Reboot the system.
Note
that you must reboot from the other half of the root (/
)
mirror. You should have recorded the alternate boot device when you created
the mirror.
#halt
... okboot disk2
To delete the failed state database replicas and then add them back, use the metadb command.
#metadb
flags first blk block count M p unknown unknown /dev/dsk/c0t3d0s3 M p unknown unknown /dev/dsk/c0t3d0s3 a m p luo 16 1034 /dev/dsk/c0t2d0s3 a p luo 1050 1034 /dev/dsk/c0t2d0s3 a p luo 16 1034 /dev/dsk/c0t1d0s3 a p luo 1050 1034 /dev/dsk/c0t1d0s3 #metadb -d c0t3d0s3
#metadb -c 2 -a c0t3d0s3
#metadb
flags first blk block count a m p luo 16 1034 /dev/dsk/c0t2d0s3 a p luo 1050 1034 /dev/dsk/c0t2d0s3 a p luo 16 1034 /dev/dsk/c0t1d0s3 a p luo 1050 1034 /dev/dsk/c0t1d0s3 a u 16 1034 /dev/dsk/c0t3d0s3 a u 1050 1034 /dev/dsk/c0t3d0s3
Re-enable the submirrors by using the metareplace command.
#metareplace -e d0 c0t3d0s0
Device /dev/dsk/c0t3d0s0 is enabled #metareplace -e d1 c0t3d0s1
Device /dev/dsk/c0t3d0s1 is enabled #metareplace -e d2 c0t3d0s6
Device /dev/dsk/c0t3d0s6 is enabled
After some time, the resynchronization will complete. You can now return to booting from the original device.