Hotfixes / Known issues
Note
This page lists known issues per release and the manual workaround to apply on a running system, until the permanent fix lands in an update release.For general installation problems see Installation troubleshooting; for operational problems after installation see Operational troubleshooting. New features and resolved bugs per release are listed in the Release Notes.
16 / 15.3 — DS389 runtime directories missing on a rebooted standby controller
Phase: operational / HA failover
Applies to TrinityX 15.3 and 16 on an HA controller pair using the ds389 LDAP backend. Resolved permanently in TRIX-1915.
After a standby controller reboots, promoting the Trinity stack onto it leaves ds389 stuck, and every resource ordered behind it (mariadb, slurmdbd, slurmctld, luna2-master, alertx-history) stays stopped:
# pcs status — ds389 on the rebooted standby
ds389 (systemd:dirsrv@local): Stopped
ds389 start on <standby> could not be executed (Error: systemd start job for dirsrv@local.service failed with result 'failed')
# ns-slapd journal on that node
EMERG - main - Unable to access nsslapd-rundir: No such file or directory
EMERG - main - Ensure that user "dirsrv" has read and write permissions on /run/dirsrv
Cause: On the active controller, dscreate drops /etc/tmpfiles.d/dirsrv-local.conf, so systemd-tmpfiles recreates /run/dirsrv and /run/lock/dirsrv/slapd-local on every boot. On the standby, dscreate never runs, and the directories were only created once at install. Because /run is tmpfs, a reboot of the standby removes them and nothing recreates them — so the next failover of ds389 onto that node fails. The active controller is unaffected.
FIX: Install the same drop-in by hand on each standby controller. No reboot and no Ansible run are required, and the active controller needs no change.
# on each standby controller -- install the drop-in dscreate puts on the active
cat > /etc/tmpfiles.d/dirsrv-local.conf <<'EOF'
d /run/dirsrv 0770 dirsrv dirsrv
d /run/lock/dirsrv/ 0770 dirsrv dirsrv
d /run/lock/dirsrv/slapd-local 0770 dirsrv dirsrv
EOF
# materialise the directories now (no reboot needed)
systemd-tmpfiles --create /etc/tmpfiles.d/dirsrv-local.conf
If ds389 has already failed over to a standby and is stuck, clear the failed resource so pacemaker retries the start now that the runtime directory exists:
# from any cluster node
pcs resource cleanup ds389
To confirm, ls -ld /run/dirsrv /run/lock/dirsrv/slapd-local should show both owned by dirsrv:dirsrv. Thanks to the drop-in they now reappear automatically after any future reboot of the standby. The drop-in only declares tmpfs runtime directories and systemd-tmpfiles --create is safe to re-run; nothing here touches the LDAP database, DRBD or ZFS.
15.3 — In HA setup, building of compute-default.yml fails: "This playbook should only be run on the active controller"
Phase: install-time
Applies to TrinityX 15.3. Resolved in 15.3u1.
TASK [gathering Facts] *******************************************************************************
ok: [controller1]
TASK [failed] ****************************************************************************************
fatal: [controller1]: FAILED! => {"changed": false, "msg": "This playbook should only be run on the active controller"}
FIX: If you are sure that you are running the playbook from the active controller, please rerun the installation with -e primary=true
example:
# ansible-playbook compute-default.yml -e primary=true
Rocky/Alma 9.5 — DRBD kmod install error during HA shared-disk setup
Phase: install-time / HA shared disk
Applies to TrinityX HA installs on Rocky/Alma 9.5. The workaround applies while elrepo's kmod-drbd9x outpaces the available 9.5 kernel; it clears once elrepo aligns with the running kernel.
While installing, during the HA setup shared disk part, the following error is encountered:
Error:
Problem: cannot install the best candidate for the job
- nothing provides kernel >= 5.14.0-570.12.1.el9_6 needed by kmod-drbd9x-9.2.13-5.el9_6.elrepo.x86_64 from elrepo
- ....
- ...
Cause: elrepo front-running Rocky/Alma 9.5 kernel versions, where kmod-drbd9x-9.2.13 requires a later kernel than the one installed.
FIX: install kmod-drbd9x-9.1.23 manually and then restart the installation of TrinityX:
# yum -y install kmod-drbd9x-9.1.23
# ansible-playbook controller.yml
See also: Installation troubleshooting · Operational troubleshooting · Release Notes