Workload management
TrinityX is using the OpenHPC for user space, including SLURM.
SLURM (OpenHPC)
TrinityX is configured to use the default paths for SLURM. The configuration directory /etc/slurm
is shared on all the nodes but links to /trinity/shared/etc/slurm
where all the files reside.
For better readability, the files have been split up and included from the main slurm.conf file.
File | Description |
---|---|
slurm.conf | Main configuration file |
acct_gather.conf | Slurm configuration file for the acct_gather plugins (see acct_gather.conf) |
slurm-health.conf | Health check configuration (where applicable) |
slurm-partitions.conf | Partition configuration |
topology.conf | Slurm configuration file for defining the network topology (see topology.conf) |
cgroup.conf | Slurm configuration file for the cgroup support (see cgroup.conf) |
slurmdbd.conf | Slurm Database Daemon (SlurmDBD) configuration file ([see slurmdbd.conf(https://slurm.schedmd.com/slurmdbd.conf.html)]) |
slurm-nodes.conf | Slurm node configuration file |
slurm-user.conf | Slurm user configuration file (e.g. QoS, priorities) |
By default Luna generates the configuration for nodes and partitions, where the partition is based on the group name. This method is useful for homogeneous node-type cluster where a default node contains the detailed configuration for CPU-s, cores and RAM. When more complexity is desired, e.g. having different node-types, the automation can be overidden by manually configuration in these files, or the graphical slurm configurator can be used.
Graphical slurm configuration application
The graphical slurm configurator in action:
Automation versus Configuration
The order of configuration is as follows:
- luna configures slurm. this is an automated process
- more advanced configuration done through the Slurm Graphical configurator
- custom configuration
Luna configures
Luna configures the nodes and groups based on luna config by default. It adheres the block:
# TrinityX will only manage inside 'TrinityX Managed block'.
# If manual override is desired, the 'TrinityX Managed block' can be removed.
#### TrinityX Managed block start ####
NodeName=node003 # GroupName=compute
NodeName=node002 # GroupName=compute
NodeName=node001 # GroupName=compute
NodeName=node004 # GroupName=compute
#### TrinityX Managed block end ####
There are no automated changes outside the clearly listed managed blocks. This would allow for some additional configuration if the administrator sees the need for it.
Note: When luna automation is desired, it is however important to configure the line:
NodeName=DEFAULT Boards=1 SocketsPerBoard=1 CoresPerSocket=1 ThreadsPerCore=1 RealMemory=100 State=UNKNOWN
where the config should match with the default node hardware configuration.
Slurm Graphical configurator
When using the Slurm Graphical configurator, it'll ask you to confirm using this approach from now on. This typically happens only once after which the managed blocks will be changed to show that the configuration is no longer managed by luna.
The same approach remains as such that no changes are made outside the listed blocks.
Custom configuration
removing the managed blocks altogether will allow for a complete custom configuration. In this case TrinityX relies on the administrator to configure slurm.