ServiceGuard arranca el mirror pero no el VG

Estaba montando un nuevo cluster de ServiceGuard con la versión 12.30 en RedHat 7.5 y me he encontrado con un problema inesperado: El paquete del cluster levantaba el mirror por software pero no detectaba los discos que utilizaban el metadevice, por lo que no podía arrancar el paquete de servicio. Estos son los detalles y la solución al problema:

  • El cluster está parado y lo intento arrancar, sin éxito:

[root@lsktdot01 ~]# cmviewcl

CLUSTER STATUS
clsSTKA down

NODE STATUS STATE
lsktdot01 down unknown
lsktdot02 down unknown

UNOWNED_PACKAGES

PACKAGE STATUS STATE AUTO_RUN NODE
pkgskta down halted enabled unowned
[root@lsktdot01 ~]# cmruncl
cmruncl: Validating network configuration…
cmruncl: Network validation complete
Checking for license………
Found Valid Enterprise License
Number of Enterprise licenses:1
Waiting for cluster to form …. done
Cluster successfully formed.
Check the syslog files on all nodes in the cluster to verify that no warnings occurred during startup.
[root@lsktdot01 ~]#
[root@lsktdot01 ~]# cmviewcl

CLUSTER STATUS
clsSTKA up

NODE STATUS STATE
lsktdot01 up running
lsktdot02 up running

UNOWNED_PACKAGES

PACKAGE STATUS STATE AUTO_RUN NODE
pkgskta down failed enabled unowned
[root@lsktdot01 ~]#

  • Miro el log del paquete de ServiceGuard y parece ser que levanta el mirror pero tiene algún problema con LVM:

Dec 24 14:03:17 root@lsktdot01 master_control_script.sh[10065]: ###### Starting package pkgskta ######
Dec 24 14:03:17 root@lsktdot01 xdc.sh[10153]: This package is configured with remote data replication.
Dec 24 14:03:17 root@lsktdot01 raid_control[10153]: Starting MD RAID1 Mirroring
Dec 24 14:03:17 root@lsktdot01 raid_control[10153]: get_raid_device_status_and_vg_map: Checking availability of component devices of /dev/md0.
Dec 24 14:03:17 root@lsktdot01 raid_control[10153]: check_if_device_is_accessible: udevinfo of /dev/mapper/vgskta1md0 is : dm-2
Dec 24 14:03:17 root@lsktdot01 raid_control[10153]: check_if_device_is_accessible: Executing cmcheckdisk on /dev/dm-2
Dec 24 14:03:17 root@lsktdot01 raid_control[10153]: check_if_device_is_accessible: Disk /dev/mapper/vgskta1md0 is available
Dec 24 14:03:17 root@lsktdot01 raid_control[10153]: check_if_device_is_accessible: Clear PR keys on /dev/mapper/vgskta1md0
Dec 24 14:03:17 root@lsktdot01 pr_util.sh[10197]: sg_activate_pr: Starting PR operation on multipath device /dev/mapper/vgskta1md0
Dec 24 14:03:17 root@lsktdot01 pr_util.sh[10197]: sg_activate_pr: activating PR on /dev/sde
Dec 24 14:03:17 root@lsktdot01 pr_util.sh[10309]: sg_deactivate_pr: deactivating PR on /dev/mapper/vgskta1md0
Dec 24 14:03:17 root@lsktdot01 raid_control[10153]: check_if_device_is_accessible: Activating /dev/md0 with /dev/mapper/vgskta1md0
Dec 24 14:03:17 root@lsktdot01 raid_control[10153]: check_if_device_is_accessible: Executing command mdadm -A -R /dev/md0 /dev/mapper/vgskta1md0
mdadm: /dev/md0 has been started with 1 drive (out of 2).
Dec 24 14:03:18 root@lsktdot01 raid_control[10153]: check_if_device_is_accessible: Successfully activated /dev/md0 with /dev/mapper/vgskta1md0
Dec 24 14:03:18 root@lsktdot01 raid_control[10153]: check_if_device_is_accessible: udevinfo of /dev/mapper/vgskta2md0 is : dm-3
Dec 24 14:03:18 root@lsktdot01 raid_control[10153]: check_if_device_is_accessible: Executing cmcheckdisk on /dev/dm-3
Dec 24 14:03:18 root@lsktdot01 raid_control[10153]: check_if_device_is_accessible: Disk /dev/mapper/vgskta2md0 is available
Dec 24 14:03:18 root@lsktdot01 raid_control[10153]: check_if_device_is_accessible: Clear PR keys on /dev/mapper/vgskta2md0
Dec 24 14:03:18 root@lsktdot01 pr_util.sh[10383]: sg_activate_pr: Starting PR operation on multipath device /dev/mapper/vgskta2md0
Dec 24 14:03:18 root@lsktdot01 pr_util.sh[10383]: sg_activate_pr: activating PR on /dev/sdb
Dec 24 14:03:18 root@lsktdot01 pr_util.sh[10471]: sg_deactivate_pr: deactivating PR on /dev/mapper/vgskta2md0
Dec 24 14:03:21 root@lsktdot01 raid_control[10153]: get_raid_device_status_and_vg_map: pvdisplay shows /dev/md0 belongs to the volume group :
Dec 24 14:03:21 root@lsktdot01 raid_control[10153]: get_raid_device_status_and_vg_map: Deactivting raid devices after mapping the volume group information
Dec 24 14:03:21 root@lsktdot01 raid_control[10153]: deactivate_raid: Deactivating md /dev/md0 using mdadm -S command
Dec 24 14:03:21 root@lsktdot01 raid_control[10153]: deactivate_raid: Executing mdadm -S /dev/md0
grep: /usr/local/cmcluster/xdc/mdstate/activation_list.10153: No such file or directory
Dec 24 14:03:21 root@lsktdot01 raid_control[10153]: activate_raid: ERROR: Both halves of the mirror not available
Dec 24 14:03:21 root@lsktdot01 raid_control[10153]: Cannot start MD /dev/md0 on Node «lsktdot01»
Dec 24 14:03:21 root@lsktdot01 raid_control[10153]: dumpstatefiles: Logging the contents of /usr/local/cmcluster/xdc/mdstate/md_vg_map.10153:
Dec 24 14:03:21 root@lsktdot01 raid_control[10153]: md_name:/dev/md0|device_0=/dev/mapper/vgskta1md0
md_name:/dev/md0|device_1=/dev/mapper/vgskta2md0
vg_name:NULL|md_device=/dev/md0
Dec 24 14:03:21 root@lsktdot01 raid_control[10153]: dumpstatefiles: Logging the contents of /usr/local/cmcluster/xdc/mdstate/vg_check_dev_list.10153:
Dec 24 14:03:21 root@lsktdot01 raid_control[10153]: md_name:/dev/md0|device_0=/dev/mapper/vgskta1md0
md_name:/dev/md0|device_1=/dev/mapper/vgskta2md0
Dec 24 14:03:21 root@lsktdot01 raid_control[10153]: dumpstatefiles: Logging the contents of /usr/local/cmcluster/xdc/mdstate/pvd_output.10153
Dec 24 14:03:21 root@lsktdot01 raid_control[10153]: Failed to find physical volume «/dev/md0».
Dec 24 14:03:21 root@lsktdot01 raid_control[10153]: test_return: ERROR: Failed to start MD Array.
Dec 24 14:03:21 root@lsktdot01 master_control_script.sh[10065]: ##### Failed to start package pkgskta, rollback steps #####
Dec 24 14:03:21 root@lsktdot01 xdc.sh[10593]: This package is configured with remote data replication.
Dec 24 14:03:21 root@lsktdot01 raid_control[10593]: Halting MD RAID1 Mirroring
Dec 24 14:03:21 root@lsktdot01 master_control_script.sh[10065]: ###### Failed to start package for pkgskta ######

  • Levanto el mirror manualmente para saber si hay algún problema real con el mirror o LVM o es algo relacionado con el producto de ServiceGuard:

El mirror arranca sin problemas

[root@lsktdot01 pkgSKTA]# mdadm -A -R md0 /dev/mapper/vgskta1md0 /dev/mapper/vgskta2md0
mdadm: /dev/md/md0 has been started with 2 drives.
mdadm: timeout waiting for /dev/md/md0
[root@lsktdot01 pkgSKTA]# mdadm –detail /dev/md0
/dev/md0:
Version : 1.2
Creation Time : Wed Dec 19 08:27:57 2018
Raid Level : raid1
Array Size : 524155904 (499.87 GiB 536.74 GB)
Used Dev Size : 524155904 (499.87 GiB 536.74 GB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent

Intent Bitmap : Internal

Update Time : Mon Dec 24 13:58:42 2018
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0

Consistency Policy : bitmap

Name : lsktdot01:0 (local to host lsktdot01)
UUID : 4e1cab9d:6ba08ef6:260bbc5d:45db8395
Events : 632

Number Major Minor RaidDevice State
0 253 2 0 active sync /dev/dm-2
1 253 3 1 active sync /dev/dm-3
[root@lsktdot01 pkgSKTA]#

Pero no se detecta el VG que usa el metadispositivo

[root@lsktdot01 pkgSKTA]# vgs
VG #PV #LV #SN Attr VSize VFree
vg00 1 12 0 wz–n- <136.21g <27.33g
[root@lsktdot01 pkgSKTA]#

[root@lsktdot01 pkgSKTA]# vgscan
Reading volume groups from cache.
Found volume group «vg00» using metadata type lvm2
[root@lsktdot01 pkgSKTA]#

[root@lsktdot01 pkgSKTA]# vgs
VG #PV #LV #SN Attr VSize VFree
vg00 1 12 0 wz–n- <136.21g <27.33g
[root@lsktdot01 pkgSKTA]#

  • Fuerzo un escaneo de physical volumes sin el uso de la caché de LVM y, esta vez, sí encuentra el VG que usa el disco de mirror (vgskta):

[root@lsktdot01 pkgSKTA]# pvscan –cache
[root@lsktdot01 pkgSKTA]# pvs
PV VG Fmt Attr PSize PFree
/dev/md0 vgskta lvm2 a– <499.86g 0
/dev/sda2 vg00 lvm2 a– <136.21g <27.33g
[root@lsktdot01 pkgSKTA]#
[root@lsktdot01 pkgSKTA]# vgscan
Reading volume groups from cache.
Found volume group «vgskta» using metadata type lvm2
Found volume group «vg00» using metadata type lvm2
[root@lsktdot01 pkgSKTA]#
[root@lsktdot01 pkgSKTA]# lvscan
inactive ‘/dev/vgskta/lvsktadmin’ [10.00 GiB] inherit
inactive ‘/dev/vgskta/lvsktasoft’ [20.00 GiB] inherit
inactive ‘/dev/vgskta/lvsktarch’ [20.00 GiB] inherit
inactive ‘/dev/vgskta/lvsktadata’ [<449.86 GiB] inherit
ACTIVE ‘/dev/vg00/swapvol’ [16.00 GiB] inherit
ACTIVE ‘/dev/vg00/crashvol’ [6.40 GiB] inherit
ACTIVE ‘/dev/vg00/auditvol’ [1.50 GiB] inherit
ACTIVE ‘/dev/vg00/varvol’ [25.00 GiB] inherit
ACTIVE ‘/dev/vg00/tmpvol’ [10.00 GiB] inherit
ACTIVE ‘/dev/vg00/rhomevol’ [2.00 GiB] inherit
ACTIVE ‘/dev/vg00/optvol’ [25.00 GiB] inherit
ACTIVE ‘/dev/vg00/homevol’ [4.00 GiB] inherit
ACTIVE ‘/dev/vg00/rootvol’ [10.00 GiB] inherit
ACTIVE ‘/dev/vg00/openv’ [8.00 GiB] inherit
ACTIVE ‘/dev/vg00/lvstats’ [500.00 MiB] inherit
ACTIVE ‘/dev/vg00/lvplanific’ [500.00 MiB] inherit
[root@lsktdot01 pkgSKTA]#

SOLUCIÓN DEFINITIVA

  • Deshabilitamos el servicio lvmetad

[root@lsktdot01 ~]# systemctl stop lvm2-lvmetad.socket
[root@lsktdot01 ~]# systemctl stop lvm2-lvmetad.service
[root@lsktdot01 ~]# systemctl disable lvm2-lvmetad.socket
Removed symlink /etc/systemd/system/sysinit.target.wants/lvm2-lvmetad.socket.
[root@lsktdot01 ~]# systemctl disable lvm2-lvmetad.service
[root@lsktdot01 log]# systemctl disable lvm2-lvmetad
[root@lsktdot01 log]# systemctl stop lvm2-lvmetad.socket
[root@lsktdot01 log]# systemctl disable lvm2-lvmetad.socket
[root@lsktdot01 log]# systemctl mask lvm2-lvmetad.socket
Created symlink from /etc/systemd/system/lvm2-lvmetad.socket to /dev/null.
[root@lsktdot01 log]#

  • Reboto el servidor para asegurarme de que en futuros reinicios no voy a encontrarme con el mismo problema.
  • Arranco el cluster correctamente.

[root@lsktdot01 ~]# cmruncl
cmruncl: Validating network configuration…
cmruncl: Network validation complete
Checking for license………
Found Valid Enterprise License
Number of Enterprise licenses:1
Waiting for cluster to form …. done
Cluster successfully formed.
Check the syslog files on all nodes in the cluster to verify that no warnings occurred during startup.
[root@lsktdot01 ~]# cmviewcl

CLUSTER STATUS
clsSTKA up

NODE STATUS STATE
lsktdot01 up running

PACKAGE STATUS STATE AUTO_RUN NODE
pkgskta up running enabled lsktdot01

NODE STATUS STATE
lsktdot02 up running
[root@lsktdot01 ~]#

[root@lsktdot01 ~]# df -hP |grep -i vgskta
/dev/mapper/vgskta-lvsktasoft 20G 13G 7.4G 64% /opt/oracle
/dev/mapper/vgskta-lvsktarch 20G 1.6G 19G 8% /bdd/SKTA_ARCH
/dev/mapper/vgskta-lvsktadata 450G 4.4G 446G 1% /bdd/SKTA
/dev/mapper/vgskta-lvsktadmin 10G 455M 9.6G 5% /opt/oracle/admin/SKTA
[root@lsktdot01 ~]#

Te puede interesar

COMPÁRTEME

Deja un comentario