AIX Tip of the Week

Subject: Redundant VIO Server Recovery

Audience: All

Date: October 28, 2005

If you reboot a mirrored VIO server, you can recover the virtual disk on its AIX client partition by running "varyonvg rootvg". Situations where this might apply include rolling updates of the VIO serer or rebooting a hung VIO server.

I tested this on a p520 as shown. Each AIX client partition had redundant virtual ethernets and disks (one from each VIO server). I used AIX's network interface backup for network failover (smitty etherchannel), and AIX mirroring for virtual disks (smit mirrorvg). The software versions were:

POD4_demo.gif

I simulated a failure by shutting down one of the mirrored VIO servers. As expected, the AIX partition continued running without interruption. The network failed over transparently, and AIX continued to run on the surviving disk mirror.

After restarting the VIO server, I recovered the "missing" disk in the client partition by running "varyonvg rootvg". The activated the disk, and sync'd the stale partitions. The steps (with commentary) are shown in the below.

This recovery procedure is valid for a VIO failure/reboot. However, it does not address recovering from a failed disk drive. I will cover this in a future AIX tip.

Comments: A common misconception is that the "lspv" command shows disk failures. The "lspv" only shows the disk status at boot time. It will not show a subsequent disk failure. To detect a missing disk, use "lsvg [-lp] vgname".

To facilitate the recovery of multiple clients, you can use the dsh command: /opt/csm/bin/dsh "varyonvg rootvg"


script command is started on Fri Oct 28 05:55:13 PDT 2005.

############################ # Pre-checks ############################

# lspv hdisk0 00c5f2cd35846965 rootvg active hdisk1 00c5f2cd359e4826 rootvg active

# lsvg -p rootvg rootvg: PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION hdisk0 active 319 255 63..46..18..64..64 hdisk1 active 319 255 63..46..18..64..64

# lsvg -l rootvg rootvg: LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT hd5 boot 1 2 2 closed/syncd N/A hd6 paging 16 32 2 open/syncd N/A hd8 jfs2log 1 2 2 open/syncd N/A hd4 jfs2 1 2 2 open/syncd / hd2 jfs2 35 70 2 open/syncd /usr hd9var jfs2 1 2 2 open/syncd /var hd3 jfs2 1 2 2 open/syncd /tmp hd1 jfs2 1 2 2 open/syncd /home hd10opt jfs2 5 10 2 open/syncd /opt local jfs2 2 4 2 open/syncd /usr/local

############################ # VIO-Server Shutdown Here # ############################

# Note "lspv" shows physical disk as active. This is normal because # "lspv" only show the status of the disk at the last boot, not current status.

# lspv hdisk0 00c5f2cd35846965 rootvg active hdisk1 00c5f2cd359e4826 rootvg active

# The correct command to check disk status is "lsvg -p " # lsvg -p rootvg rootvg: PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION hdisk0 active 319 255 63..46..18..64..64 hdisk1 missing 319 255 63..46..18..64..64

# You can also see the stale PV's and PP's using "lsvg "

# lsvg rootvg VOLUME GROUP: rootvg VG IDENTIFIER: 00c5f2cd00004c000000010735847ede VG STATE: active PP SIZE: 32 megabyte(s) VG PERMISSION: read/write TOTAL PPs: 638 (20416 megabytes) MAX LVs: 256 FREE PPs: 510 (16320 megabytes) LVs: 10 USED PPs: 128 (4096 megabytes) OPEN LVs: 9 QUORUM: 1 TOTAL PVs: 2 VG DESCRIPTORS: 3 STALE PVs: 1 STALE PPs: 3 ACTIVE PVs: 1 AUTO ON: yes MAX PPs per VG: 32512 MAX PPs per PV: 1016 MAX PVs: 32 LTG size (Dynamic): 256 kilobyte(s) AUTO SYNC: no HOT SPARE: no BB POLICY: relocatable

# lsvg -l rootvg rootvg: LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT hd5 boot 1 2 2 closed/syncd N/A hd6 paging 16 32 2 open/syncd N/A hd8 jfs2log 1 2 2 open/stale N/A hd4 jfs2 1 2 2 open/stale / hd2 jfs2 35 70 2 open/syncd /usr hd9var jfs2 1 2 2 open/stale /var hd3 jfs2 1 2 2 open/syncd /tmp hd1 jfs2 1 2 2 open/syncd /home hd10opt jfs2 5 10 2 open/syncd /opt local jfs2 2 4 2 open/syncd /usr/local

########################### # Restarted VIO Server here ###########################

# Varyonvg sync's the disks

# varyonvg rootvg

############################ # No more stale PV's or PP's ############################

# lsvg rootvg VOLUME GROUP: rootvg VG IDENTIFIER: 00c5f2cd00004c000000010735847ede VG STATE: active PP SIZE: 32 megabyte(s) VG PERMISSION: read/write TOTAL PPs: 638 (20416 megabytes) MAX LVs: 256 FREE PPs: 510 (16320 megabytes) LVs: 10 USED PPs: 128 (4096 megabytes) OPEN LVs: 9 QUORUM: 1 TOTAL PVs: 2 VG DESCRIPTORS: 3 STALE PVs: 0 STALE PPs: 0 ACTIVE PVs: 2 AUTO ON: yes MAX PPs per VG: 32512 MAX PPs per PV: 1016 MAX PVs: 32 LTG size (Dynamic): 256 kilobyte(s) AUTO SYNC: no HOT SPARE: no BB POLICY: relocatable

########################### # hdisk1 is active again ########################### # lsvg -p rootvg rootvg: PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION hdisk0 active 319 255 63..46..18..64..64 hdisk1 active 319 255 63..46..18..64..64

# LV's are sync'd

# lsvg -l rootvg rootvg: LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT hd5 boot 1 2 2 closed/syncd N/A hd6 paging 16 32 2 open/syncd N/A hd8 jfs2log 1 2 2 open/syncd N/A hd4 jfs2 1 2 2 open/syncd / hd2 jfs2 35 70 2 open/syncd /usr hd9var jfs2 1 2 2 open/syncd /var hd3 jfs2 1 2 2 open/syncd /tmp hd1 jfs2 1 2 2 open/syncd /home hd10opt jfs2 5 10 2 open/syncd /opt local jfs2 2 4 2 open/syncd /usr/local

Script command is complete on Fri Oct 28 06:05:39 PDT 2005.



Bruce Spencer,
baspence@us.ibm.com

October 28, 2005