Installation troubleshooting
For troubleshooting problems after installation, please refer to Operational troubleshooting
Installation fails due to repo or meta data
Installation fails with a message that a repo or meta data could not be reached. As a result a package could not be installed
TASK [trinity/openldap : Install OpenLDAP packages] *****************************************************************************************************************************************
fatal: [controller1]: FAILED! => {"changed": false, "msg": "Failed to download metadata for repo 'plus': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried", "rc": 1, "results": []}
This tends to happen on occasion where we rely on external distribution repositories to be available. When one of the mirrors is temporarily unavailable, it causes the errors seen. A rerun of the playbook is the best approach; a simple retry.
TASK [trinity/chrony : Install chrony packages] *********************************************************************************************************************************************
fatal: [compute-rocky8.osimages.luna]: FAILED! => {"changed": false, "failures": [], "msg": "Unknown Error occurred: Could not run transaction.", "rc": 1, "results": []}
Same for the above. Just rerun the playbook again.
Could not bind to openldap
TASK [trinity/sssd : Adding access controler groups to the system] **********************************************************************************************************************
failed: [controller1] (item=admins) => {"ansible_loop_var": "item", "changed": true, "cmd": "/usr/local/sbin/obol group list | grep admins || /usr/local/sbin/obol group add admins", "delta": "0:00:00.195545", "end": "2025-02-10 13:25:41.418735", "item": "admins", "msg": "non-zero return code", "rc": 1, "start": "2025-02-10 13:25:41.223190", "stderr": "[ConnectionError] Failed binding to ldap\n[ConnectionError] Failed binding to ldap", "stderr_lines": ["[ConnectionError] Failed binding to ldap", "[ConnectionError] Failed binding to ldap"], "stdout": "", "stdout_lines": []}
This almost certainly is caused by having a previous openldap configuration in place where a certificate mismatch prevents connecting to the openldap backend. Make sure that there is no legacy in place while trying a complete (re)install of TrinityX.
Details: The symlink to an existing openldap is not present and will be placed. However no files/config is being regenerated. Also see the Installation notes
# ls -l /etc/openldap/
total 20
# the clashing certificates:
drwxr-xr-x. 2 root root 4096 Feb 11 01:44 certs
-rw-r--r--. 1 root root 121 Jul 26 2024 check_password.conf
-rw-r--r--. 1 root root 1545 Feb 11 01:45 ldap.conf
-rw-r--r--. 1 root root 900 Apr 30 2024 ldap.conf.ipabkp
drwxr-xr-x. 2 root root 4096 Feb 11 01:48 schema
# this link:
lrwxrwxrwx. 1 root root 35 Feb 11 01:44 slapd.d -> /trinity/local/etc/openldap/slapd.d
DRBD kmod installation error on Alma/Rocky 9.5
While installing, during the HA setup shared disk part, the following error is encountered:
Error:
Problem: cannot install the best candidate for the job
- nothing provides kernel >= 5.14.0-570.12.1.el9_6 needed by kmod-drbd9x-9.2.13-5.el9_6.elrepo.x86_64 from elrepo
- ....
- ...
This is due to elrepo front-running Rocky/Alma 9.5 kernels versions where kmod-drbd9x-9.2.13 requires a later kernel.
Work around: install kmod-drbd9x-9.1.23 manually: yum -y install kmod-drbd9x-9.1.23 and then restart the installation of trinityX: ansible-playbook controller.yml
Ansible
When following the main branch, there may be updates to the variables in group_vars/all.yml.example which may not be incorporated with your group_vars/all.yml. A message such as the following may appear if this is the case:
# ansible-playbook controller.yml
PLAY [controllers] *****************************************************************************************************
TASK [Gathering Facts] *************************************************************************************************
fatal: [controller1]: FAILED! => {"msg": "The field 'environment' has an invalid value, which includes an undefined variable. The error was: 'trix_external_fqdn' is undefined. 'trix_external_fqdn' is undefined. 'trix_external_fqdn' is undefined. 'trix_external_fqdn' is undefined"}
To see what needs to be adjusted a diff can be done:
# diff group_vars/all.yml group_vars/all.yml.example
Please refer to Updating and Upgrading for more information.