ServiceGuard arranca el mirror pero no el VG

Share on facebook
Share on twitter
Share on linkedin
Share on whatsapp
Share on telegram
Share on email

Estaba montando un nuevo cluster de ServiceGuard con la versión 12.30 en RedHat 7.5 y me he encontrado con un problema inesperado: El paquete del cluster levantaba el mirror por software pero no detectaba los discos que utilizaban el metadevice, por lo que no podía arrancar el paquete de servicio. Estos son los detalles y la solución al problema:

  • El cluster está parado y lo intento arrancar, sin éxito:

[[email protected] ~]# cmviewcl

CLUSTER STATUS
clsSTKA down

NODE STATUS STATE
lsktdot01 down unknown
lsktdot02 down unknown

UNOWNED_PACKAGES

PACKAGE STATUS STATE AUTO_RUN NODE
pkgskta down halted enabled unowned
[[email protected] ~]# cmruncl
cmruncl: Validating network configuration…
cmruncl: Network validation complete
Checking for license………
Found Valid Enterprise License
Number of Enterprise licenses:1
Waiting for cluster to form …. done
Cluster successfully formed.
Check the syslog files on all nodes in the cluster to verify that no warnings occurred during startup.
[[email protected] ~]#
[[email protected] ~]# cmviewcl

CLUSTER STATUS
clsSTKA up

NODE STATUS STATE
lsktdot01 up running
lsktdot02 up running

UNOWNED_PACKAGES

PACKAGE STATUS STATE AUTO_RUN NODE
pkgskta down failed enabled unowned
[[email protected] ~]#

  • Miro el log del paquete de ServiceGuard y parece ser que levanta el mirror pero tiene algún problema con LVM:

Dec 24 14:03:17 [email protected] master_control_script.sh[10065]: ###### Starting package pkgskta ######
Dec 24 14:03:17 [email protected] xdc.sh[10153]: This package is configured with remote data replication.
Dec 24 14:03:17 [email protected] raid_control[10153]: Starting MD RAID1 Mirroring
Dec 24 14:03:17 [email protected] raid_control[10153]: get_raid_device_status_and_vg_map: Checking availability of component devices of /dev/md0.
Dec 24 14:03:17 [email protected] raid_control[10153]: check_if_device_is_accessible: udevinfo of /dev/mapper/vgskta1md0 is : dm-2
Dec 24 14:03:17 [email protected] raid_control[10153]: check_if_device_is_accessible: Executing cmcheckdisk on /dev/dm-2
Dec 24 14:03:17 [email protected] raid_control[10153]: check_if_device_is_accessible: Disk /dev/mapper/vgskta1md0 is available
Dec 24 14:03:17 [email protected] raid_control[10153]: check_if_device_is_accessible: Clear PR keys on /dev/mapper/vgskta1md0
Dec 24 14:03:17 [email protected] pr_util.sh[10197]: sg_activate_pr: Starting PR operation on multipath device /dev/mapper/vgskta1md0
Dec 24 14:03:17 [email protected] pr_util.sh[10197]: sg_activate_pr: activating PR on /dev/sde
Dec 24 14:03:17 [email protected] pr_util.sh[10309]: sg_deactivate_pr: deactivating PR on /dev/mapper/vgskta1md0
Dec 24 14:03:17 [email protected] raid_control[10153]: check_if_device_is_accessible: Activating /dev/md0 with /dev/mapper/vgskta1md0
Dec 24 14:03:17 [email protected] raid_control[10153]: check_if_device_is_accessible: Executing command mdadm -A -R /dev/md0 /dev/mapper/vgskta1md0
mdadm: /dev/md0 has been started with 1 drive (out of 2).
Dec 24 14:03:18 [email protected] raid_control[10153]: check_if_device_is_accessible: Successfully activated /dev/md0 with /dev/mapper/vgskta1md0
Dec 24 14:03:18 [email protected] raid_control[10153]: check_if_device_is_accessible: udevinfo of /dev/mapper/vgskta2md0 is : dm-3
Dec 24 14:03:18 [email protected] raid_control[10153]: check_if_device_is_accessible: Executing cmcheckdisk on /dev/dm-3
Dec 24 14:03:18 [email protected] raid_control[10153]: check_if_device_is_accessible: Disk /dev/mapper/vgskta2md0 is available
Dec 24 14:03:18 [email protected] raid_control[10153]: check_if_device_is_accessible: Clear PR keys on /dev/mapper/vgskta2md0
Dec 24 14:03:18 [email protected] pr_util.sh[10383]: sg_activate_pr: Starting PR operation on multipath device /dev/mapper/vgskta2md0
Dec 24 14:03:18 [email protected] pr_util.sh[10383]: sg_activate_pr: activating PR on /dev/sdb
Dec 24 14:03:18 [email protected] pr_util.sh[10471]: sg_deactivate_pr: deactivating PR on /dev/mapper/vgskta2md0
Dec 24 14:03:21 [email protected] raid_control[10153]: get_raid_device_status_and_vg_map: pvdisplay shows /dev/md0 belongs to the volume group :
Dec 24 14:03:21 [email protected] raid_control[10153]: get_raid_device_status_and_vg_map: Deactivting raid devices after mapping the volume group information
Dec 24 14:03:21 [email protected] raid_control[10153]: deactivate_raid: Deactivating md /dev/md0 using mdadm -S command
Dec 24 14:03:21 [email protected] raid_control[10153]: deactivate_raid: Executing mdadm -S /dev/md0
grep: /usr/local/cmcluster/xdc/mdstate/activation_list.10153: No such file or directory
Dec 24 14:03:21 [email protected] raid_control[10153]: activate_raid: ERROR: Both halves of the mirror not available
Dec 24 14:03:21 [email protected] raid_control[10153]: Cannot start MD /dev/md0 on Node «lsktdot01»
Dec 24 14:03:21 [email protected] raid_control[10153]: dumpstatefiles: Logging the contents of /usr/local/cmcluster/xdc/mdstate/md_vg_map.10153:
Dec 24 14:03:21 [email protected] raid_control[10153]: md_name:/dev/md0|device_0=/dev/mapper/vgskta1md0
md_name:/dev/md0|device_1=/dev/mapper/vgskta2md0
vg_name:NULL|md_device=/dev/md0
Dec 24 14:03:21 [email protected] raid_control[10153]: dumpstatefiles: Logging the contents of /usr/local/cmcluster/xdc/mdstate/vg_check_dev_list.10153:
Dec 24 14:03:21 [email protected] raid_control[10153]: md_name:/dev/md0|device_0=/dev/mapper/vgskta1md0
md_name:/dev/md0|device_1=/dev/mapper/vgskta2md0
Dec 24 14:03:21 [email protected] raid_control[10153]: dumpstatefiles: Logging the contents of /usr/local/cmcluster/xdc/mdstate/pvd_output.10153
Dec 24 14:03:21 [email protected] raid_control[10153]: Failed to find physical volume «/dev/md0».
Dec 24 14:03:21 [email protected] raid_control[10153]: test_return: ERROR: Failed to start MD Array.
Dec 24 14:03:21 [email protected] master_control_script.sh[10065]: ##### Failed to start package pkgskta, rollback steps #####
Dec 24 14:03:21 [email protected] xdc.sh[10593]: This package is configured with remote data replication.
Dec 24 14:03:21 [email protected] raid_control[10593]: Halting MD RAID1 Mirroring
Dec 24 14:03:21 [email protected] master_control_script.sh[10065]: ###### Failed to start package for pkgskta ######

  • Levanto el mirror manualmente para saber si hay algún problema real con el mirror o LVM o es algo relacionado con el producto de ServiceGuard:

El mirror arranca sin problemas

[[email protected] pkgSKTA]# mdadm -A -R md0 /dev/mapper/vgskta1md0 /dev/mapper/vgskta2md0
mdadm: /dev/md/md0 has been started with 2 drives.
mdadm: timeout waiting for /dev/md/md0
[[email protected] pkgSKTA]# mdadm –detail /dev/md0
/dev/md0:
Version : 1.2
Creation Time : Wed Dec 19 08:27:57 2018
Raid Level : raid1
Array Size : 524155904 (499.87 GiB 536.74 GB)
Used Dev Size : 524155904 (499.87 GiB 536.74 GB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent

Intent Bitmap : Internal

Update Time : Mon Dec 24 13:58:42 2018
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0

Consistency Policy : bitmap

Name : lsktdot01:0 (local to host lsktdot01)
UUID : 4e1cab9d:6ba08ef6:260bbc5d:45db8395
Events : 632

Number Major Minor RaidDevice State
0 253 2 0 active sync /dev/dm-2
1 253 3 1 active sync /dev/dm-3
[[email protected] pkgSKTA]#

Pero no se detecta el VG que usa el metadispositivo

[[email protected] pkgSKTA]# vgs
VG #PV #LV #SN Attr VSize VFree
vg00 1 12 0 wz–n- <136.21g <27.33g
[[email protected] pkgSKTA]#

[[email protected] pkgSKTA]# vgscan
Reading volume groups from cache.
Found volume group «vg00» using metadata type lvm2
[[email protected] pkgSKTA]#

[[email protected] pkgSKTA]# vgs
VG #PV #LV #SN Attr VSize VFree
vg00 1 12 0 wz–n- <136.21g <27.33g
[[email protected] pkgSKTA]#

  • Fuerzo un escaneo de physical volumes sin el uso de la caché de LVM y, esta vez, sí encuentra el VG que usa el disco de mirror (vgskta):

[[email protected] pkgSKTA]# pvscan –cache
[[email protected] pkgSKTA]# pvs
PV VG Fmt Attr PSize PFree
/dev/md0 vgskta lvm2 a– <499.86g 0
/dev/sda2 vg00 lvm2 a– <136.21g <27.33g
[[email protected] pkgSKTA]#
[[email protected] pkgSKTA]# vgscan
Reading volume groups from cache.
Found volume group «vgskta» using metadata type lvm2
Found volume group «vg00» using metadata type lvm2
[[email protected] pkgSKTA]#
[[email protected] pkgSKTA]# lvscan
inactive ‘/dev/vgskta/lvsktadmin’ [10.00 GiB] inherit
inactive ‘/dev/vgskta/lvsktasoft’ [20.00 GiB] inherit
inactive ‘/dev/vgskta/lvsktarch’ [20.00 GiB] inherit
inactive ‘/dev/vgskta/lvsktadata’ [<449.86 GiB] inherit
ACTIVE ‘/dev/vg00/swapvol’ [16.00 GiB] inherit
ACTIVE ‘/dev/vg00/crashvol’ [6.40 GiB] inherit
ACTIVE ‘/dev/vg00/auditvol’ [1.50 GiB] inherit
ACTIVE ‘/dev/vg00/varvol’ [25.00 GiB] inherit
ACTIVE ‘/dev/vg00/tmpvol’ [10.00 GiB] inherit
ACTIVE ‘/dev/vg00/rhomevol’ [2.00 GiB] inherit
ACTIVE ‘/dev/vg00/optvol’ [25.00 GiB] inherit
ACTIVE ‘/dev/vg00/homevol’ [4.00 GiB] inherit
ACTIVE ‘/dev/vg00/rootvol’ [10.00 GiB] inherit
ACTIVE ‘/dev/vg00/openv’ [8.00 GiB] inherit
ACTIVE ‘/dev/vg00/lvstats’ [500.00 MiB] inherit
ACTIVE ‘/dev/vg00/lvplanific’ [500.00 MiB] inherit
[[email protected] pkgSKTA]#

SOLUCIÓN DEFINITIVA

  • Deshabilitamos el servicio lvmetad

[[email protected] ~]# systemctl stop lvm2-lvmetad.socket
[[email protected] ~]# systemctl stop lvm2-lvmetad.service
[[email protected] ~]# systemctl disable lvm2-lvmetad.socket
Removed symlink /etc/systemd/system/sysinit.target.wants/lvm2-lvmetad.socket.
[[email protected] ~]# systemctl disable lvm2-lvmetad.service
[[email protected] log]# systemctl disable lvm2-lvmetad
[[email protected] log]# systemctl stop lvm2-lvmetad.socket
[[email protected] log]# systemctl disable lvm2-lvmetad.socket
[[email protected] log]# systemctl mask lvm2-lvmetad.socket
Created symlink from /etc/systemd/system/lvm2-lvmetad.socket to /dev/null.
[[email protected] log]#

  • Reboto el servidor para asegurarme de que en futuros reinicios no voy a encontrarme con el mismo problema.
  • Arranco el cluster correctamente.

[[email protected] ~]# cmruncl
cmruncl: Validating network configuration…
cmruncl: Network validation complete
Checking for license………
Found Valid Enterprise License
Number of Enterprise licenses:1
Waiting for cluster to form …. done
Cluster successfully formed.
Check the syslog files on all nodes in the cluster to verify that no warnings occurred during startup.
[[email protected] ~]# cmviewcl

CLUSTER STATUS
clsSTKA up

NODE STATUS STATE
lsktdot01 up running

PACKAGE STATUS STATE AUTO_RUN NODE
pkgskta up running enabled lsktdot01

NODE STATUS STATE
lsktdot02 up running
[[email protected] ~]#

[[email protected] ~]# df -hP |grep -i vgskta
/dev/mapper/vgskta-lvsktasoft 20G 13G 7.4G 64% /opt/oracle
/dev/mapper/vgskta-lvsktarch 20G 1.6G 19G 8% /bdd/SKTA_ARCH
/dev/mapper/vgskta-lvsktadata 450G 4.4G 446G 1% /bdd/SKTA
/dev/mapper/vgskta-lvsktadmin 10G 455M 9.6G 5% /opt/oracle/admin/SKTA
[[email protected] ~]#

Te puede interesar

¿Te ha gustado? ¡Compártelo!

Share on facebook
Share on twitter
Share on linkedin
Share on whatsapp
Share on telegram
Share on email

SUSCRÍBETE A PUERTO53

Recibe un email periódico con los artículos más interesantes de Puerto53.com

Antes de suscribirte lee los términos y condiciones. Gracias.

Contenido Relacionado

Artículos Recientes

Deja un comentario

About Author