This section describes how to replace disks in a Solaris Volume Manager environment.
If you have soft partitions on a failed disk or on volumes
that are built on a failed disk, you must put the new disk in the same physical
location Also, use the same c
n
t
n
d
n
number
as the disk being replaced.
How to Replace a Failed Disk
Identify the failed disk to be replaced by examining the /var/adm/messages
file and the metastat command output.
Locate any state database replicas that might have been placed on the failed disk.
Use the metadb command to find the replicas.
The metadb command
might report errors for the state database replicas that are located on the
failed disk. In this example, c0t1d0
is the problem device.
# metadb
flags first blk block count
a m u 16 1034 /dev/dsk/c0t0d0s4
a u 1050 1034 /dev/dsk/c0t0d0s4
a u 2084 1034 /dev/dsk/c0t0d0s4
W pc luo 16 1034 /dev/dsk/c0t1d0s4
W pc luo 1050 1034 /dev/dsk/c0t1d0s4
W pc luo 2084 1034 /dev/dsk/c0t1d0s4
The output shows three state database replicas on each slice 4 of the
local disks, c0t0d0
and c0t1d0
.
The W
in the flags field of the c0t1d0s4
slice
indicates that the device has write errors. Three replicas on the c0t0d0s4
slice are still good.
Record the slice name where the state database replicas reside and the number of state database replicas. Then, delete the state database replicas.
The number of state
database replicas is obtained by counting the number of appearances of a slice
in the metadb command output. In this example, the three
state database replicas that exist on c0t1d0s4
are deleted.
# metadb -d c0t1d0s4
If, after deleting the bad state database replicas, you are left with three or fewer, add more state database replicas before continuing. Doing so helps to ensure that configuration information remains intact.
Locate and delete any hot spares on the failed disk.
Use the metastat command
to find hot spares. In this example, hot spare pool hsp000
included c0t1d0s6
, which is then deleted from the pool.
# metahs -d hsp000 c0t1d0s6
hsp000: Hotspare is deleted
Replace the failed disk.
This step might entail using the cfgadm command, the luxadm command, or other commands as appropriate for your hardware and environment. When performing this step, make sure to follow your hardware's documented procedures to properly manipulate the Solaris state of this disk.
Repartition the new disk.
Use the format command or the fmthard command
to partition the disk with the same slice information as the failed disk.
If you have the prtvtoc output from the failed disk, you
can format the replacement disk with the fmthard -s /tmp/failed-disk-prtvtoc-output
command.
If you deleted state database replicas, add the same number back to the appropriate slice.
In
this example, /dev/dsk/c0t1d0s4
is used.
# metadb -a -c 3 c0t1d0s4
If any slices on the disk are components of RAID-5 volumes or are components of RAID-0 volumes that are in turn submirrors of RAID-1 volumes, run the metareplace -e command for each slice.
In this example, /dev/dsk/c0t1d0s4
and
mirror d10
are used.
# metareplace -e d10 c0t1d0s4
If any soft partitions are built directly on slices on the replaced disk, run the metarecover -m -p command on each slice that contains soft partitions. This command regenerates the extent headers on disk.
In this example, /dev/dsk/c0t1d0s4
needs
to have the soft partition markings on disk regenerated. The slice is scanned
and the markings are reapplied, based on the information in the state database
replicas.
# metarecover c0t1d0s4 -m -p
If any soft partitions on the disk are components of RAID-5 volumes or are components of RAID-0 volumes that are submirrors of RAID-1 volumes, run the metareplace -e command for each slice.
In
this example, /dev/dsk/c0t1d0s4
and mirror d10
are
used.
# metareplace -e d10 c0t1d0s4
If any RAID-0 volumes have soft partitions built on them, run the metarecover command for each RAID-0 volume.
In this example, RAID-0 volume, d17
,
has soft partitions built on it.
# metarecover d17 -m -p
Replace hot spares that were deleted, and add them to the appropriate hot spare pool or pools.
In this
example, hot spare pool, hsp000
included c0t1d0s6
.
This slice is added to the hot spare pool.
# metahs -a hsp000 c0t1d0s6
hsp000: Hotspare is added
If soft partitions or nonredundant volumes were affected by the failure, restore data from backups. If only redundant volumes were affected, then validate your data.
Check the user and application data on all volumes. You might have to run an application-level consistency checker, or use some other method to check the data.