TrinityX and Luna utilities
TrinityX comes with a number of tools to manage the cluster. The main utility is luna, which is used to configure the cluster management daemon. Please refer to sections image management, Node and group management and Power management to see the luna-CLI in action. Other utilities, not directly related to the configuration of a cluster, are part of luna-utils and are outlined below:
lchroot
Lchroot allows to chroot into the image. It sets up the environment and ensures that the configured kernel for the image is the 'running' version in the image.
Usage:
# lchroot
osimage need to be specified.
Type 'luna osimage list' to get the list.
Example:
# lchroot compute
IMAGE PATH: /trinity/images/compute
chroot [root@compute /]$
Lchroot is used to chroot into an image, including mounting the sysfs and procfs filesystems. Since an image can have a different kernel version than the controller node, lchroot makes sure to mimick the configured version number as well. To leave the image, simply exit (or CTRL+D).
Please see luna osimage list for a valid list of configured images.
Please ensure to pack after the modifications are done (see image management)
lcluster
Lcluster is used to get a quick overview of the cluster health and general status.
It will return the IPMI status, Luna installer status, SLURM status and monitoring health (Sensu)
There are no options or arguments for this tool.
Usage:
# lcluster
Example:
# lcluster
Wait, Fetching IMPI Status of Nodes with https://controller1:7050 ...
+--------------------------------------------------------+
| << Health & Status of Nodes >> |
+----+---------+------+-------------------------+--------+
| # | Node | IPMI | Luna | SLURM |
+----+---------+------+-------------------------+--------+
| 1 | node001 | ON | Luna installer: success | IDLE |
| 2 | node002 | ON | Luna installer: success | IDLE |
| 3 | node003 | ON | Luna installer: success | BUSY |
| 4 | node004 | ON | Luna installer: success | DOWN |
| 5 | node005 | ON | Luna installer: success | DOWN |
| 6 | node006 | ON | Luna installer: success | DOWN |
| 7 | node007 | ON | Luna installer: success | DOWN |
| 8 | node008 | OFF | Luna installer: success | DOWN |
| 9 | node009 | ON | Luna installer: success | draine |
| 10 | node010 | ON | Luna installer: success | draine |
+----+---------+------+-------------------------+--------+
lexport
Lexport is a tool to export and import cluster and osimage config/data.
Usage:
# lexport <params>
usage: lexport <-c|-o> <-e|-i> [file]
Luna configuration im/exporter.
positional arguments:
-c, --cluster cluster level.
-o, --osimage osimage level.
-e, --export exports configuration.
-i, --import imports configuration/data.
optional arguments:
file use file for imports and exports. mandatory when importing.
when exporting osimage and no file given, it will render
a file based on cluster name, osimage name and date.
without --force it will warn if a file will be overwritten.
-n, --name used only in combination with osimage operations.
-m, --matthew use an external config file during osimage operations, Matthew mode.
used for osimage imports and exports. handle with care.
-h, --help show this help message and exit.
-f, --force do not warn, do not ask, just do it.
examples:
lexport -c -e /tmp/cluster-config.dat exports all cluster configuration to /tmp/cluster-config.dat
lexport -c -e exports all cluster configuration and prints to STDOUT
lexport -c -i /tmp/cluster-config.dat imports all cluster configuration from /tmp/cluster-config.dat
lexport -o -e -n compute /tmp/compute.tar exports compute osimage to compute.tar with embedded configuration
lexport -o -i /tmp/compute.tar imports compute.tar with embedded configuration
lexport -o -i /tmp/compute.tar -p /trinity/images/compute_2
imports compute.tar, using embedded configuration but
overrides path to /trinity/images/compute_2
Example:
# lexport -c -e /tmp/cluster-config.dat
lnode
Lnode is used to check the system event log (SEL).
# lnode
usage: lnode {list,clear} <host|hostlist>
Luna SEL commands
positional arguments:
{list,clear} sub-command help
list list all the SEL entries for one node
clear clear all the SEL entries for one or more nodes
options:
-h, --help show this help message and exit
Example:
# lnode list node001
1 | 02/21/2024 | 13:13:11 | Physical Security Chassis Intru | General Chassis intrusion () | Asserted
2 | 03/01/2024 | 09:29:24 | Physical Security Chassis Intru | General Chassis intrusion () | Asserted
3 | 03/01/2024 | 09:29:27 | Power Supply PS2 Status | Failure detected () | Asserted
...
lpower
Lpower is used to control the power state of the configured nodes. See power management. The utility takes the python-hostlist notation (i.e. node[001-004])
# lpower
usage: lpower [-h] [--rack|-r RACKNAME] [--group|-g GROUP]
[hosts] {status,on,off,reset,cycle,identify,noidentify}
BMC power management.
positional arguments:
hosts Host list. Any combination of:
node[x-y],
nodex,nodey,...
nodex
{status,on,off,reset,cycle,identify,noidentify}
Action
optional arguments:
-h, --help show this help message and exit
-g GROUP, --group GROUP perform the action on nodes of the group
-r RACK, --rack RACK perform the action on nodes inside the rack
Example:
# lpower node[001-004] on
# lpower -g compute on
| Command | Description |
|---|---|
| status | Returns the current power status of the node |
| on | Sends a power on signal to the node |
| off | Send a power off signal to the node |
| reset | Sends a chassis power reset to the node (hard reset) |
| cycle | Send a power off interval of at least 1 second (see ipmitool) |
| identify | Turns on the identification LED on the node (where supported) |
| noidentify | Turns off the identification LED on the node (where supported) |
lrack
Lrack manages racks and the placement of equipment inside them from the command line, providing the same capabilities as the Rack View Open OnDemand application. It operates on racks and on the device inventory (nodes, switches, other devices and controllers). Device arguments accept the python-hostlist notation (i.e. node[001-004]).
# lrack --help
usage: lrack [-h] [-V] [-v] [-R] [-e [FILE] | -i FILE] [-r RACK] [-f]
{list,show,add,change,rename,remove,place,unplace,resize,orient,inventory,pool}
...
Manage racks and the placement of devices inside them.
options:
-h, --help show this help message and exit
-V, --version show program's version number and exit
-v, --verbose verbose mode
-R, --raw raw JSON output
bulk import/export (JSON):
-e [FILE], --export [FILE]
export rack layout as JSON to FILE (STDOUT if omitted)
-i FILE, --import FILE
import rack layout from a JSON FILE
-r RACK, --rack RACK limit export to a single rack
-f, --force overwrite existing export file / allow overlap on import
Subcommands
| Command | Description |
|---|---|
| list | List all racks with their utilisation (used/free U, device count) |
| show | Show racks as an ASCII elevation (see below) |
| add | Create a rack (-s size in U, -d order, -m room, -t site) |
| change | Change rack properties |
| rename | Rename a rack |
| remove | Delete a rack; its devices return to the pool |
| place | Place device(s) into a rack (-r rack, -p position, -o orientation, -H height, -f force) |
| unplace | Remove device(s) from their rack |
| resize | Set a device height in U (-H) |
| orient | Set a device orientation, front or back (-o) |
| inventory | List the device inventory (configured/unconfigured subset) |
| pool | List unconfigured devices available for placement |
When place is given no -p/--position, the device(s) auto-fill the first free slots, following the rack numbering order. A placement that would exceed the rack size is declined; an overlap requires -f/--force.
Examples:
# lrack add rack01 -s 42 -m DC1 -t AMS
# lrack place node[001-020] -r rack01 -p 1
# lrack place gpu01 -r rack01 # auto-stack into the first free slot
# lrack resize jbod01 -H 4
# lrack orient sw01 -o back
# lrack list
Easy syntax
For quick, scriptable changes there is a positional shorthand:
# lrack node[001-020] in rack01 # place, auto-stacking into free slots
# lrack node001 in rack01 at 5 back # place at U5, orientation back
# lrack node001 out # unplace
# lrack rack01 # bare rack name shows its elevation
Viewing racks
lrack show adapts the level of detail to the number of racks and the terminal width:
| Racks | View |
|---|---|
| 1 | full ASCII elevation |
| 2-5 | side-by-side elevations (wrapping to bands) |
| more than 5 | one fill-gauge line per rack, with totals |
A single rack is drawn as a full elevation. Front-facing equipment is shown in green and rear-facing in yellow, empty slots are shaded:
# lrack show rack01
rack01 · site AMS · room DC1 · 12U · ascending
┌────────────────────────────────────────────────┐
12 │██ sw01 switch Mellanox 1U │ ◀ back
11 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░│
10 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░│
9 │██ │
8 │██ │
7 │██ │
6 │██ │
5 │██ gpu01 node Supermicro 4U │ ▶ front
4 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░│
3 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░│
2 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░│
1 │██ node001 node Dell 1U │ ▶ front
└────────────────────────────────────────────────┘
used 8U · free 4U · 4 devices
A handful of racks are drawn side by side:
# lrack show rack01 rack02 rack03
rack01 2/12U rack02 4/12U rack03 0/12U
┌───────────────────────────┐ ┌───────────────────────────┐ ┌───────────────────────────┐
12│░░░░░░░░░░░░░░░░░░░░░░░░░░░│ 12│░░░░░░░░░░░░░░░░░░░░░░░░░░░│ 12│░░░░░░░░░░░░░░░░░░░░░░░░░░░│
. (rows omitted) . .
4│░░░░░░░░░░░░░░░░░░░░░░░░░░░│ 4│██ │ 4│░░░░░░░░░░░░░░░░░░░░░░░░░░░│
3│░░░░░░░░░░░░░░░░░░░░░░░░░░░│ 3│██ │ 3│░░░░░░░░░░░░░░░░░░░░░░░░░░░│
2│██ node002 node 1U │ 2│██ │ 2│░░░░░░░░░░░░░░░░░░░░░░░░░░░│
1│██ node001 node 1U │ 1│██ gpu01 node 4U │ 1│░░░░░░░░░░░░░░░░░░░░░░░░░░░│
└───────────────────────────┘ └───────────────────────────┘ └───────────────────────────┘
Beyond five racks lrack show switches to a one-line fill gauge per rack with totals, which stays readable for whole data centres:
# lrack show
rack01 AMS/DC1 42U [███████████████████░░░░░] 79% 22 dev
rack02 AMS/DC1 42U [██████░░░░░░░░░░░░░░░░░░] 24% 4 dev
rack03 AMS/DC2 42U [███████████░░░░░░░░░░░░░] 45% 10 dev
rack04 AMS/DC2 42U [█░░░░░░░░░░░░░░░░░░░░░░░] 5% 2 dev
rack05 AMS/DC1 42U [█████████████░░░░░░░░░░░] 52% 8 dev
rack06 AMS/DC2 42U [█████████████░░░░░░░░░░░] 55% 15 dev
rack07 AMS/DC2 42U [███░░░░░░░░░░░░░░░░░░░░░] 14% 5 dev
rack08 AMS/DC2 42U [█████████████████░░░░░░░] 71% 16 dev
────── 8 racks · 145U used / 336U total · 43%
Named racks (lrack show rack01 rack02) always render as elevations. The level can be forced with -F/--full, -s/--summary or -M/--map (a per-U heatmap), and the layout tuned with -c/--columns N and -w/--width N.
Bulk import and export
The complete rack layout (and the device inventory) round-trips as JSON, in the same style as lexport:
# lrack -e # export everything to STDOUT
# lrack -e layout.json # export to a file (-f to overwrite)
# lrack -e -r rack01 rack01.json # export a single rack
# lrack -i layout.json # import and apply a layout
Import is idempotent: existing racks and placements are updated. The layout is validated before anything is applied, so a placement that exceeds its rack size is declined and nothing is changed.
Tab-completion of subcommands, options and live rack and device names is available out of the box.
lmaster
Lmaster is a utility to view the HA status of the luna daemons and set the HA master
# lmaster -h
usage: lmaster [-h|-s|-w|-a]
Gets Luna2 master state of controller, based on utils luna.ini config
optional arguments:
-h, --help show this help message and exit
-s, --set sets master state for controller configured as endpoint in luna.ini
in most cases it's the controller where this command is invoked
-w, --who tells who of the controllers is master
-a, --all returns current HA values of all controllers
Examples:
# lmaster
Configured endpoint is ha2-controller1
ha2-controller1 is the master
# lmaster -a
Configured endpoint is ha2-controller1
ha2-controller1: enabled: True master: True insync: True syncimages: True overrule: False shadow: False
ha2-controller2: enabled: True master: False insync: True syncimages: True overrule: False shadow: False
setting the current controller as master:
# lmaster -s
Configured endpoint is ha2-controller1
current role set to master
bootutil
Bootutil inspects and changes the UEFI/BIOS boot order of a node through its BMC's Redfish interface. It talks directly to the node's BMC rather than through Luna, so --host is the BMC's Redfish endpoint (including the protocol) and the user and password are the BMC credentials.
# bootutil -h
Usage: bootutil [options...] <mode>
<mode> can be either:
list -- list available boot options
get -- get current boot order
set <order> -- set current boot order
Available [options...]:
-H, --host -- Redfish host. Must include protocol, e.g. https://host
-U, --user -- HTTP user name
-P, --password -- HTTP user password
| Mode | Description |
|---|---|
| list | Lists the available boot devices, each with its ID, name and description. |
| get | Shows the current boot order, in sequence, by boot-option ID. |
| set <order> | Sets the boot order. The order is a quoted, space-separated list of the boot-option IDs (as shown by list or get), in the desired order. |
List the available boot devices of a node's BMC:
# bootutil -H https://10.148.0.1 -U admin -P <password> list
Available boot devices:
ID |Name |Desc
------+----------------+------------------------------------------------------
Boot0000|Hard Drive |UEFI Hard Drive
Boot0001|Network |UEFI PXE Network
Boot0002|UEFI Shell |UEFI Shell
Show the current boot order:
# bootutil -H https://10.148.0.1 -U admin -P <password> get
Current boot order:
1 - Boot0000 UEFI Hard Drive
2 - Boot0001 UEFI PXE Network
Put network boot first, followed by the hard drive:
# bootutil -H https://10.148.0.1 -U admin -P <password> set "Boot0001 Boot0000"