TrinityX and Luna utilities

TrinityX comes with a number of tools to manage the cluster. The main utility is luna, which is used to configure the cluster management daemon. Please refer to sections image management, Node and group management and Power management to see the luna-CLI in action. Other utilities, not directly related to the configuration of a cluster, are part of luna-utils and are outlined below:

lchroot

Lchroot allows to chroot into the image. It sets up the environment and ensures that the configured kernel for the image is the 'running' version in the image.

Usage:

# lchroot
osimage need to be specified.
Type 'luna osimage list' to get the list.

Example:

# lchroot compute
IMAGE PATH: /trinity/images/compute
chroot [root@compute /]$

Lchroot is used to chroot into an image, including mounting the sysfs and procfs filesystems. Since an image can have a different kernel version than the controller node, lchroot makes sure to mimick the configured version number as well. To leave the image, simply exit (or CTRL+D).

Please see luna osimage list for a valid list of configured images.

Please ensure to pack after the modifications are done (see image management)

lcluster

Lcluster is used to get a quick overview of the cluster health and general status.

It will return the IPMI status, Luna installer status, SLURM status and monitoring health (Sensu)

There are no options or arguments for this tool.

Usage:

# lcluster

Example:

# lcluster
Wait, Fetching IMPI Status of Nodes with https://controller1:7050 ...
+--------------------------------------------------------+
|             << Health & Status of Nodes >>             |
+----+---------+------+-------------------------+--------+
| #  |   Node  | IPMI |           Luna          | SLURM  |
+----+---------+------+-------------------------+--------+
| 1  | node001 |  ON  | Luna installer: success |  IDLE  |
| 2  | node002 |  ON  | Luna installer: success |  IDLE  |
| 3  | node003 |  ON  | Luna installer: success |  BUSY  |
| 4  | node004 |  ON  | Luna installer: success |  DOWN  |
| 5  | node005 |  ON  | Luna installer: success |  DOWN  |
| 6  | node006 |  ON  | Luna installer: success |  DOWN  |
| 7  | node007 |  ON  | Luna installer: success |  DOWN  |
| 8  | node008 |  OFF | Luna installer: success |  DOWN  |
| 9  | node009 |  ON  | Luna installer: success | draine |
| 10 | node010 |  ON  | Luna installer: success | draine |
+----+---------+------+-------------------------+--------+

lexport

Lexport is a tool to export and import cluster and osimage config/data.

Usage:

# lexport <params>

usage: lexport <-c|-o> <-e|-i> [file]

Luna configuration im/exporter.

positional arguments:
  -c, --cluster         cluster level.
  -o, --osimage         osimage level.
  -e, --export          exports configuration.
  -i, --import          imports configuration/data.

optional arguments:
  file                  use file for imports and exports. mandatory when importing.
                        when exporting osimage and no file given, it will render
                        a file based on cluster name, osimage name and date.
                        without --force it will warn if a file will be overwritten.
  -n, --name            used only in combination with osimage operations.
  -m, --matthew         use an external config file during osimage operations, Matthew mode. 
                        used for osimage imports and exports. handle with care.
  -h, --help            show this help message and exit.
  -f, --force           do not warn, do not ask, just do it.

examples:
  lexport -c -e /tmp/cluster-config.dat     exports all cluster configuration to /tmp/cluster-config.dat
  lexport -c -e                             exports all cluster configuration and prints to STDOUT
  lexport -c -i /tmp/cluster-config.dat     imports all cluster configuration from /tmp/cluster-config.dat
  lexport -o -e -n compute /tmp/compute.tar exports compute osimage to compute.tar with embedded configuration
  lexport -o -i /tmp/compute.tar            imports compute.tar with embedded configuration
  lexport -o -i /tmp/compute.tar -p /trinity/images/compute_2    
                                            imports compute.tar, using embedded configuration but
                                            overrides path to /trinity/images/compute_2

Example:

# lexport -c -e /tmp/cluster-config.dat

lnode

Lnode is used to check the system event log (SEL).

# lnode
usage: lnode {list,clear} <host|hostlist>

Luna SEL commands

positional arguments:
  {list,clear}  sub-command help
    list        list all the SEL entries for one node
    clear       clear all the SEL entries for one or more nodes

options:
  -h, --help    show this help message and exit

Example:

# lnode list node001
 1 | 02/21/2024 | 13:13:11 | Physical Security Chassis Intru | General Chassis intrusion () | Asserted
 2 | 03/01/2024 | 09:29:24 | Physical Security Chassis Intru | General Chassis intrusion () | Asserted
 3 | 03/01/2024 | 09:29:27 | Power Supply PS2 Status | Failure detected () | Asserted
...

lpower

Lpower is used to control the power state of the configured nodes. See power management. The utility takes the python-hostlist notation (i.e. node[001-004])

# lpower

usage: lpower [-h] [--rack|-r RACKNAME] [--group|-g GROUP]
              [hosts] {status,on,off,reset,cycle,identify,noidentify}

BMC power management.

positional arguments:
  hosts                     Host list. Any combination of: 
                               node[x-y],
                               nodex,nodey,...
                               nodex
  {status,on,off,reset,cycle,identify,noidentify}
                            Action

optional arguments:
  -h, --help                show this help message and exit
  -g GROUP, --group GROUP   perform the action on nodes of the group
  -r RACK, --rack RACK      perform the action on nodes inside the rack

Example:

# lpower node[001-004] on
# lpower -g compute on
Command Description
status Returns the current power status of the node
on Sends a power on signal to the node
off Send a power off signal to the node
reset Sends a chassis power reset to the node (hard reset)
cycle Send a power off interval of at least 1 second (see ipmitool)
identify Turns on the identification LED on the node (where supported)
noidentify Turns off the identification LED on the node (where supported)

lrack

Added in TrinityX 16.

Lrack manages racks and the placement of equipment inside them from the command line, providing the same capabilities as the Rack View Open OnDemand application. It operates on racks and on the device inventory (nodes, switches, other devices and controllers). Device arguments accept the python-hostlist notation (i.e. node[001-004]).

# lrack --help

usage: lrack [-h] [-V] [-v] [-R] [-e [FILE] | -i FILE] [-r RACK] [-f]
             {list,show,add,change,rename,remove,place,unplace,resize,orient,inventory,pool}
             ...

Manage racks and the placement of devices inside them.

options:
  -h, --help            show this help message and exit
  -V, --version         show program's version number and exit
  -v, --verbose         verbose mode
  -R, --raw             raw JSON output

bulk import/export (JSON):
  -e [FILE], --export [FILE]
                        export rack layout as JSON to FILE (STDOUT if omitted)
  -i FILE, --import FILE
                        import rack layout from a JSON FILE
  -r RACK, --rack RACK  limit export to a single rack
  -f, --force           overwrite existing export file / allow overlap on import

Subcommands

Command Description
list List all racks with their utilisation (used/free U, device count)
show Show racks as an ASCII elevation (see below)
add Create a rack (-s size in U, -d order, -m room, -t site)
change Change rack properties
rename Rename a rack
remove Delete a rack; its devices return to the pool
place Place device(s) into a rack (-r rack, -p position, -o orientation, -H height, -f force)
unplace Remove device(s) from their rack
resize Set a device height in U (-H)
orient Set a device orientation, front or back (-o)
inventory List the device inventory (configured/unconfigured subset)
pool List unconfigured devices available for placement

When place is given no -p/--position, the device(s) auto-fill the first free slots, following the rack numbering order. A placement that would exceed the rack size is declined; an overlap requires -f/--force.

Examples:

# lrack add rack01 -s 42 -m DC1 -t AMS
# lrack place node[001-020] -r rack01 -p 1
# lrack place gpu01 -r rack01            # auto-stack into the first free slot
# lrack resize jbod01 -H 4
# lrack orient sw01 -o back
# lrack list

Easy syntax

For quick, scriptable changes there is a positional shorthand:

# lrack node[001-020] in rack01            # place, auto-stacking into free slots
# lrack node001 in rack01 at 5 back        # place at U5, orientation back
# lrack node001 out                        # unplace
# lrack rack01                             # bare rack name shows its elevation

Viewing racks

lrack show adapts the level of detail to the number of racks and the terminal width:

Racks View
1 full ASCII elevation
2-5 side-by-side elevations (wrapping to bands)
more than 5 one fill-gauge line per rack, with totals

A single rack is drawn as a full elevation. Front-facing equipment is shown in green and rear-facing in yellow, empty slots are shaded:

# lrack show rack01

rack01   ·  site AMS  ·  room DC1  ·  12U  ·  ascending
     ┌────────────────────────────────────────────────┐
  12 │██ sw01           switch       Mellanox     1U  │  ◀ back
  11 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░│
  10 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░│
   9 │██                                              │
   8 │██                                              │
   7 │██                                              │
   6 │██                                              │
   5 │██ gpu01          node         Supermicro   4U  │  ▶ front
   4 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░│
   3 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░│
   2 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░│
   1 │██ node001        node         Dell         1U  │  ▶ front
     └────────────────────────────────────────────────┘
     used 8U  ·  free 4U  ·  4 devices

A handful of racks are drawn side by side:

# lrack show rack01 rack02 rack03

rack01  2/12U                    rack02  4/12U                    rack03  0/12U
  ┌───────────────────────────┐    ┌───────────────────────────┐    ┌───────────────────────────┐
12│░░░░░░░░░░░░░░░░░░░░░░░░░░░│  12│░░░░░░░░░░░░░░░░░░░░░░░░░░░│  12│░░░░░░░░░░░░░░░░░░░░░░░░░░░│
 .  (rows omitted)              .                                .
 4│░░░░░░░░░░░░░░░░░░░░░░░░░░░│   4│██                         │   4│░░░░░░░░░░░░░░░░░░░░░░░░░░░│
 3│░░░░░░░░░░░░░░░░░░░░░░░░░░░│   3│██                         │   3│░░░░░░░░░░░░░░░░░░░░░░░░░░░│
 2│██ node002     node    1U  │   2│██                         │   2│░░░░░░░░░░░░░░░░░░░░░░░░░░░│
 1│██ node001     node    1U  │   1│██ gpu01       node    4U  │   1│░░░░░░░░░░░░░░░░░░░░░░░░░░░│
  └───────────────────────────┘    └───────────────────────────┘    └───────────────────────────┘

Beyond five racks lrack show switches to a one-line fill gauge per rack with totals, which stays readable for whole data centres:

# lrack show

  rack01  AMS/DC1       42U  [███████████████████░░░░░]  79%   22 dev
  rack02  AMS/DC1       42U  [██████░░░░░░░░░░░░░░░░░░]  24%    4 dev
  rack03  AMS/DC2       42U  [███████████░░░░░░░░░░░░░]  45%   10 dev
  rack04  AMS/DC2       42U  [█░░░░░░░░░░░░░░░░░░░░░░░]   5%    2 dev
  rack05  AMS/DC1       42U  [█████████████░░░░░░░░░░░]  52%    8 dev
  rack06  AMS/DC2       42U  [█████████████░░░░░░░░░░░]  55%   15 dev
  rack07  AMS/DC2       42U  [███░░░░░░░░░░░░░░░░░░░░░]  14%    5 dev
  rack08  AMS/DC2       42U  [█████████████████░░░░░░░]  71%   16 dev
  ──────  8 racks · 145U used / 336U total · 43%

Named racks (lrack show rack01 rack02) always render as elevations. The level can be forced with -F/--full, -s/--summary or -M/--map (a per-U heatmap), and the layout tuned with -c/--columns N and -w/--width N.

Bulk import and export

The complete rack layout (and the device inventory) round-trips as JSON, in the same style as lexport:

# lrack -e                       # export everything to STDOUT
# lrack -e layout.json           # export to a file (-f to overwrite)
# lrack -e -r rack01 rack01.json # export a single rack
# lrack -i layout.json           # import and apply a layout

Import is idempotent: existing racks and placements are updated. The layout is validated before anything is applied, so a placement that exceeds its rack size is declined and nothing is changed.

Tab-completion of subcommands, options and live rack and device names is available out of the box.

lmaster

Lmaster is a utility to view the HA status of the luna daemons and set the HA master

# lmaster -h

usage: lmaster [-h|-s|-w|-a]

Gets Luna2 master state of controller, based on utils luna.ini config

optional arguments:
  -h, --help            show this help message and exit
  -s, --set             sets master state for controller configured as endpoint in luna.ini
                        in most cases it's the controller where this command is invoked
  -w, --who             tells who of the controllers is master
  -a, --all             returns current HA values of all controllers

Examples:

# lmaster
Configured endpoint is ha2-controller1
ha2-controller1 is the master

# lmaster -a
Configured endpoint is ha2-controller1
ha2-controller1:   enabled: True   master: True   insync: True   syncimages: True   overrule: False  shadow: False  
ha2-controller2:   enabled: True   master: False  insync: True   syncimages: True   overrule: False  shadow: False

setting the current controller as master:

# lmaster -s
Configured endpoint is ha2-controller1
current role set to master

bootutil

Bootutil inspects and changes the UEFI/BIOS boot order of a node through its BMC's Redfish interface. It talks directly to the node's BMC rather than through Luna, so --host is the BMC's Redfish endpoint (including the protocol) and the user and password are the BMC credentials.

# bootutil -h

Usage: bootutil [options...] <mode>

<mode> can be either:
  list         -- list available boot options
  get          -- get current boot order
  set <order>  -- set current boot order

Available [options...]:
 -H, --host      -- Redfish host. Must include protocol, e.g. https://host
 -U, --user      -- HTTP user name
 -P, --password  -- HTTP user password
Mode Description
list Lists the available boot devices, each with its ID, name and description.
get Shows the current boot order, in sequence, by boot-option ID.
set <order> Sets the boot order. The order is a quoted, space-separated list of the boot-option IDs (as shown by list or get), in the desired order.

List the available boot devices of a node's BMC:

# bootutil -H https://10.148.0.1 -U admin -P <password> list

Available boot devices:

ID    |Name            |Desc
------+----------------+------------------------------------------------------
Boot0000|Hard Drive     |UEFI Hard Drive
Boot0001|Network        |UEFI PXE Network
Boot0002|UEFI Shell     |UEFI Shell

Show the current boot order:

# bootutil -H https://10.148.0.1 -U admin -P <password> get

Current boot order:
1 - Boot0000 UEFI Hard Drive
2 - Boot0001 UEFI PXE Network

Put network boot first, followed by the hard drive:

# bootutil -H https://10.148.0.1 -U admin -P <password> set "Boot0001 Boot0000"