luna2-daemon — Architecture & Node Boot Flow
Component: luna2-daemon (Luna 2 Project / TrinityX)
Source branch: development (HEAD 371511ce, version 2.1)
Listens on: TCP [::]:7050 (REST API, JSON)
Last updated: 10 Jun 2026
Scope — Internal architecture of
luna2-daemon: the role of thebase,utilsandrouteslayers, how the plugin system works (and how plugins are selected per node/group/hostname/distro, §6.4), thetempl_installinstaller template,base/boot.py, and the end-to-end node boot/provisioning sequence. It also covers the node-side counterpartluna2-client(§12) — the deb/rpm dracut module that provides the pre-pivot hook which authenticates the node and runs the installer. Written from thedevelopmentbranch (luna2-daemon) andmainbranch (luna2-client); for developers and integrators (not end users).Two halves of one boot. luna2-daemon (server-side — all sections except §12) renders and serves the iPXE menu, the node-boot script and the installer. luna2-client (node-side, §12) is a dracut module baked into every OS image that consumes those artefacts: it brings up the NIC, authenticates via
/tpm/<node>, fetches/boot/install/<node>and runs it — all before the initramfs pivots into the installed system.
1. Overview & design philosophy
Luna 2 Daemon is the core cluster-management service of TrinityX: a stateless Flask REST API served by Gunicorn, backed by a SQL database (SQLite / MySQL / PostgreSQL via unixODBC). It owns everything needed to define and provision a cluster — nodes, groups, OS images, networks, DNS, DHCP, BMC setup, secrets, switches, racks, monitoring and HA replication.
Its single most important job is netbooting and installing compute nodes: it generates the iPXE menus, the kernel/initrd boot scripts, and the bash installer that runs inside the node's initramfs.
| Property | Value |
|---|---|
| Entry point (WSGI app) | daemon/luna.py → daemon (Flask object) |
| Process manager | Gunicorn, config daemon/config/gunicorn.py |
| systemd unit | daemon/config/luna2-daemon.service |
| Bind / port | [::]:7050, 4 workers |
| Introspection routes | GET /version, GET /all-routes |
| Config file | luna.ini → parsed into CONSTANT (common/constant.py) |
⚠️
daemon/daemon.pycontains only stubinstall/update/upgradeplaceholders — it is not the running service. The live application isdaemon/luna.py, loaded by Gunicorn viaimport lunaingunicorn.py.
1.1 Design philosophy
Five principles recur throughout the codebase and shape the rest of this document:
- Strict layering. A request flows
routes → base → utils; the HTTP edge stays free of logic and the logic stays free of HTTP. Onebasemethod per route. (§2–§5) - Plugin-driven, not patched. Site-, OS- and vendor-specific behaviour lives in
plugins/, selected per node/group/hostname/distro and injected into outputs — the daemon core never changes for a new distro or BMC. (§6) - Replicate intent, not rows. Multi-controller HA re-executes the method call that changed state on every peer rather than copying database rows, so heterogeneous controllers converge by running the same business logic. (§8)
- One return contract. Every layer returns
(status, message[, request_id]); a message from the deepest plugin reaches the CLI unaltered, synchronously or via a polled status stream. (§15) - Stateless, template-rendered output. The daemon holds no per-request state; everything a node sees (iPXE, installer, dhcp, dns) is a Jinja2 template rendered from the DB on demand. (§7; §9–§12)
1.2 Configuration inheritance (node → group → …)
Most of a node's settings are not stored on the node at all — they are resolved at read time from a precedence chain, so an administrator configures a group once and every member inherits it, overriding only what differs on a specific node. base/node.py:get_node() performs this resolution for every field and records the winning level in a _<field>_source marker (node / group / osimage / cluster / default), plus an _override flag when the node itself set the value — this is exactly what the CLI prints as "Config differs from parent — local overrides".
The order of precedence, most specific first: node → group → OS image (+ tag) → cluster → daemon defaults — i.e. the lowest level in the map below that sets a field wins. Two profiles are referenced out-of-band: bmcsetup (BMC credentials) and the node/group interface definitions (network membership, DHCP). The map reads top-down, general defaults refining toward the specific node:
A few fields compose rather than simply override — e.g. effective kerneloptions can be taken from the node, the group or the OS image; and an OS image whose imagefile is kickstart forces provision_method=kickstart regardless of the group or cluster.
2. High-level architecture
Layered application under daemon/. A request flows downward; output is rendered from templates.
2.1 Supporting packages
| Package | Role |
|---|---|
daemon/common/ |
Process-wide foundation: constants, bootstrap, auth/input decorators and the DB schema. constant.py loads luna.ini into CONSTANT + LOGGER + LUNAKEY; bootstrap.py seeds a first-run cluster; validate_auth.py / validate_input.py provide the @token_required / @validate_name decorators; database_layout.py declares every table. Constants & validation: §17; first-run bootstrap & schema: §14. |
daemon/config/ |
Shipped config: gunicorn.py, luna.ini, luna2-daemon.service, bootstrap.ini, nginx/transmission units. |
daemon/templates/ |
Jinja2 templates the daemon renders and serves (iPXE menus, node boot scripts, installer, dhcpd/kea, DNS zones). |
daemon/log/ |
Runtime log directory. |
3. The routes layer
routes/ is the HTTP edge. Each file defines a Flask Blueprint registered in luna.py. Handlers are deliberately thin: apply decorators, call one base method, translate the (status, data) tuple into an HTTP code, and (for boot endpoints) render a template. No business logic here.
Two guard decorators:
- @token_required (common/validate_auth.py) — enforces a JWT x-access-tokens header.
- @validate_name (common/validate_input.py) — sanitises path params (node/group names).
3.1 Route catalogue
| Blueprint file | Mount / purpose |
|---|---|
auth.py |
/token — authenticate, issue JWT |
boot.py |
/boot, /boot/search/mac/…, /boot/manual/…, /boot/install/<node>, /kickstart/install/<node> — the netboot + install entry points |
boot_roles.py |
/boot/roles/<role> — systemd role units + scripts |
boot_scripts.py |
/boot/scripts/<script> — pre/part/post install script bodies |
config_node.py |
/config/node/… — node CRUD, interfaces, status |
config_group.py |
/config/group/… — group CRUD |
config_osimage.py / config_osgroup.py |
OS image + os-group ops (pack, grab, push, tags) |
config_network.py |
networks, IP allocation |
config_dns.py |
DNS zone management |
config_bmcsetup.py |
BMC credential / channel profiles |
config_switch.py |
switch definitions (switchport detection) |
config_otherdev.py |
non-node devices (PDUs, etc.) |
config_rack.py |
rack / datacenter layout |
config_secrets.py |
/config/secrets/node/<name> — encrypted per-node/group secrets |
config_cloud.py |
cloud / alternative-provisioning targets |
config_cluster.py |
cluster-wide settings |
config_status.py |
long-running task status messages |
control.py |
/control/… — power actions via control plugins |
monitor.py |
/monitor/node/<name> — node state updates (installer posts here) |
service.py |
start/stop/reload of managed services (dhcp, dns…) |
files.py |
/files/… — serves kernels, initrds, image files |
journal.py, ha.py, tables.py |
HA replication, journal replay, raw table access between controllers |
tracker.py |
torrent tracker endpoint (torrent provisioning) |
plugin_export.py / plugin_import.py |
export/import of config + boot plugins between controllers |
4. The base layer
base/ holds the business logic — one class per managed resource. These classes assemble datasets from the DB, enforce domain rules, enqueue service actions, call plugins, and return (status, data) tuples.
| Class (file) | Responsibility |
|---|---|
Boot (boot.py) |
Provisioning brain — node discovery, iPXE menu, node-boot script, installer assembly (§10–8) |
Node (node.py) |
node datasets, filtering, CRUD |
Group (group.py) |
group definitions + inheritance of image/kernel/netboot settings |
Interface (interface.py) |
node/group interfaces, MAC↔node binding |
Network / Dns |
IP networks, DNS zones, resolution |
OSImage (osimage.py) |
image pack/grab/push, tags, kernel/initrd |
BMCSetup (bmcsetup.py) |
BMC credential profiles |
Switch / OtherDev / Rack |
switches, other devices, rack layout |
Secret (secret.py) |
per-node/group encrypted secrets served at install time |
Roles / Scripts |
systemd role units; pre/part/post install scripts |
Cloud / Cluster |
cloud bursting targets, cluster settings |
Control (control.py) |
power control orchestration (delegates to control plugins) |
Monitor / Service |
node state tracking, managed-service control |
Authentication |
login + JWT issuance |
Journal / Tables |
HA — record & replay mutating calls to peer controllers |
PluginExport / PluginImport |
bundle and transfer config + plugins |
Pattern: a
routeshandler calls exactly onebasemethod; thebasemethod owns the DB queries and plugin calls. Handlers stay free of logic;basestays free of Flask/HTTP concerns.
5. The utils layer
utils/ provides the shared services and helpers used by every base class — DB access, plugin loading, service control, queueing, HA mechanics, template assembly.
| Module | Responsibility |
|---|---|
database.py |
ODBC abstraction — get_record, get_record_join, update, insert, delete; the single DB gateway |
helper.py |
catch-all helpers — incl. plugin_finder() / plugin_load() (plugin entry points), bool/base64/jinja helpers, nodes_and_groups() |
plugin_manager.py |
PluginManager — resolves & caches plugin classes with search/priority + on-disk reload detection (§6) |
plugin_tree.py |
build_plugin_tree() — walks a plugin dir into a nested dict |
template_manager.py |
TemplateManager — same tree mechanism, locates .templ files |
plugin_sync.py |
PluginSync — replicates boot-plugin files between HA controllers (background worker) |
config.py |
builds rendered config sets (interfaces, DHCP, DNS) from DB records |
service.py |
Service — start/stop/reload/status of managed daemons |
queue.py |
Queue — internal task queue; mutating work is enqueued and drained by housekeeper threads |
housekeeper.py |
Housekeeper — background "mother" threads (status cleanup, queue drain, switchport scan, journal replication, invalid-config sweep, osimage tasks) |
boot.py (UBoot) |
boot-side helpers — e.g. verify_bootpause() (don't boot a node while its image is being packed) |
osimage.py / downloader.py |
image packing/pushing and download helpers |
journal.py, controller.py, ha.py, request.py, tables.py, dbstructure.py |
HA: track/replicate/replay mutating requests; manage DB schema (deep dive: §8) |
monitor.py, status.py |
node state + status message storage |
log.py |
Log.get_logger() — shared logger |
filter.py, model.py, files.py, ping.py, control.py |
input filtering, data models, file validation, ping registry, control offload |
6. How plugins work
Plugins let site- and OS-specific behaviour be swapped in without changing the daemon. They live under daemon/plugins/ and are loaded dynamically at request time.
6.1 Plugin categories
| Path | Kind | Provides |
|---|---|---|
plugins/boot/provision/ |
fetch var + create/cleanup |
how a node downloads its image: http, torrent, kickstart, default |
plugins/boot/network/ |
string vars | per-distro interface config: redhat8/9/10, ubuntu, opensuse (+ .templ) |
plugins/boot/bmc/ |
config var |
BMC config bash: default (ipmitool), dell |
plugins/boot/scripts/ |
methods | install scripts: diskfull, raid1, nodhcp, default |
plugins/boot/roles/ |
class | systemd role builders: bond, default |
plugins/boot/detection/ |
class | node identity discovery: switchport, cloud |
plugins/control/ |
class | power backends: default (ipmi), dell |
plugins/hooks/ |
class | lifecycle hooks: hooks/luna (startup/shutdown), hooks/config/*, hooks/control, hooks/monitor |
plugins/osimage/ |
class | image ops: operations/image, osgrab, ospush, filesystem, other/cleanup |
plugins/export/ & plugins/import/ |
class | config/Prometheus rule + boot-plugin export/import |
6.2 Loading mechanism
Two helpers on Helper (utils/helper.py) are the public entry points; they delegate to PluginManager:
boot_plugins = Helper().plugin_finder(f'{plugins_path}/boot') # build tree
provision_plugin = Helper().plugin_load(boot_plugins, 'boot/provision', 'http') # resolve class
- Tree build —
build_plugin_tree()(plugin_tree.py) walks the dir withos.walkinto a nested dict{dir: {...}, file.py: None}. - Resolution + priority —
PluginManager.load()tries candidate module names in order for a(root, levelone, leveltwo)request. E.g.boot/network+redhat+9: plugins.boot.network.redhat9(concatenated)plugins.boot.network.redhat.9(sub-module)plugins.boot.network.redhat.defaultplugins.boot.network.redhat9(single)plugins.boot.network.default(final fallback)- Class selection —
_resolve_plugin_class()returns the requested class name (defaultPlugin), falling back toPlugin. - Caching + hot reload — resolved classes cached in
_class_cache. Before serving from cache,_module_changed_on_disk()compares the file's(mtime_ns, size)fingerprint; if changed, the cache is invalidated and the module isimportlib.reload-ed. Operators can edit plugins live.
6.3 The segment-injection pattern (most important)
Provision/network/bmc/script plugins do not run on the node. Each exposes its behaviour as a bash string attribute (or method source). The daemon reads the raw installer template, string-substitutes each plugin's snippet into a named marker, then Jinja2-renders the whole thing.
Example — the http provision plugin (plugins/boot/provision/http.py):
class Plugin():
def create(self, ...): return True, "Success"
def cleanup(self, ...): return True, "Success"
# 'fetch' is the bash injected into the installer to download the image
fetch = """
curl $INTERFACE -H "x-access-tokens: $LUNA_TOKEN" -s \
{{ WEBSERVER_PROTOCOL }}://{{ LUNA_CONTROLLER }}:{{ WEBSERVER_PORT }}/files/{{ LUNA_IMAGEFILE }} \
> /{{ LUNA_SYSTEMROOT }}/{{ LUNA_IMAGEFILE }}
return $?
"""
In base/boot.py:install() the snippet is wrapped and substituted into the marker:
segment = str(provision_plugin().fetch)
segment = f"function download_{method} {{\n{segment}\n}}\n## FETCH CODE SEGMENT"
template_data = template_data.replace("## FETCH CODE SEGMENT", segment)
The same mechanism fills every ## … CODE SEGMENT marker in templ_install.cfg: NETWORK INIT, INTERFACE, GATEWAY, DNS (+ IPv6), BMC, and SCRIPT PRE/PART/POST. The ## FETCH CODE SEGMENT marker is re-appended each time so primary + fallback provision methods can be stacked.
6.4 How a plugin is selected (node / hostname / group / distro)
The key idea: plugin selection is driven by attributes of the specific node being provisioned.
PluginManager.load()accepts a list forleveloneand tries each entry in order, falling through todefault. So a list like[nodename, group, distribution]produces a most-specific-wins hierarchy: hostname → group → distro → default.
All selection happens in base/boot.py:install() (and discover_*) via Helper().plugin_load(tree, root, levelone, leveltwo):
| Plugin | Selected by (levelone → leveltwo) |
Effect |
|---|---|---|
osimage/operations/image |
distribution → osrelease |
per-distro image unpack + systemroot (e.g. ubuntu vs default) |
boot/provision |
provision_method, then provision_fallback |
how the node downloads its image (http/torrent/kickstart) — from node/group/cluster config |
boot/network |
distribution → osrelease |
per-distro/release interface config (redhat+9 → redhat9, ubuntu, opensuse) |
boot/bmc |
[nodename, group] |
BMC config — a node-specific plugin overrides a group one, else default |
boot/scripts |
[nodename, group, distribution] |
install scripts resolved node → group → distro → default |
boot/detection |
switchport, cloud (pre-loaded) |
identifies which node a booting MAC is, by switch port or cloud metadata |
For each candidate, PluginManager's internal resolution order is: exact levelone+leveltwo file → levelone/leveltwo submodule → levelone/default → levelone.py → <root>/default. This is what lets you drop a plugins/boot/bmc/<hostname>.py or plugins/boot/network/redhat9.py and have it picked up automatically for matching nodes, with default.py always as the safety net.
distribution/osreleasecome from the node's assigned OS image;provision_method/provision_fallbackfrom node/group/cluster provisioning settings;nodename/groupfrom the resolved node. Change any of these in the DB and a different plugin is selected on the next boot — no daemon restart needed (plugins hot-reload, §6.2).
7. Templates — dynamic config rendering
Almost nothing luna2-daemon emits is static. Every artefact a node or a managed service sees — the iPXE menu, the installer, the DHCP and DNS configuration — is a Jinja2 template under daemon/templates/, rendered on demand from the current database state. This is what keeps controllers stateless (§1.1): change a row, re-render, never hand-edit a config file.
7.1 The template set
| Group | Templates | Rendered for |
|---|---|---|
| Boot | templ_boot_ipxe[_short].cfg, templ_nodeboot.cfg, templ_install.cfg, templ_boot_disk.cfg, templ_boot_failed.cfg (+ _kickstart variants) |
served to the node during netboot (§9–§11) |
| DHCP | templ_dhcpd.cfg, templ_dhcpd6.cfg, templ_kea-dhcp4.cfg, templ_kea-dhcp6.cfg |
the controller's DHCP server config |
| DNS | templ_dns_conf.cfg, templ_dns_zone.cfg, templ_dns_zones_conf.cfg |
named/zone files |
7.2 Two rendering paths
- Per-request boot artefacts. A
routeshandler renders a boot template withrender_template/render_template_string, fills it with DB-derived data, and returns the text straight to the node — e.g.base/boot.py:install()→templ_install.cfg(§10). Nothing is written to disk. - On-change service configs. When a network, node or DNS record changes,
utils/config.py(Config) renders the DHCP/DNS templates with a Jinja2Environment(FileSystemLoader(...))from the DB, writes the result into the workingTMP_DIRECTORY, validates it (dhcpd -t -cf …), then deploys it (e.g./etc/dhcp/dhcpd.conf) and reloads the service. This is the dynamic config rendering behindluna network change(§13.4): the operator never editsdhcpd.confor zone files — they are regenerated from the templates.
7.3 Template resolution & overrides (shared with plugins)
utils/template_manager.py (TemplateManager) locates templates with the same tree + search/priority mechanism as the plugin loader (§6.2). A .templ file can therefore be selected per distribution/osrelease/group/node and takes precedence over the equivalent .py plugin — e.g. a node's interface block is taken from plugins/boot/network/redhat9.templ if present, otherwise the redhat9.py network plugin's bash segments are injected instead (§6.3, §6.4).
At startup, constant.py copies the templates listed in TEMPLATES.TEMPLATE_LIST into the working TMP_DIRECTORY, so the running daemon always renders from an isolated, validated set.
8. High availability, the journal & replication philosophy
8.1 Philosophy — replicate intent, not rows
luna2-daemon scales to multiple controllers with an active/active, eventually-consistent model that replicates intent, not data. Rather than replicating SQL rows or shipping a binary DB log, each controller records the method call that changed state and re-executes that same call on every peer. The unit of replication is a logical request ("Node.node_update on object X with this payload"), not a row diff.
Consequences of this choice:
- Controllers may run different DB backends (one SQLite, one MySQL) and still converge — only the logical operation crosses the wire.
- Replay runs the same
basebusiness logic on each controller, so derived state (rendered configs, queued service restarts, journalled sub-calls) is reproduced naturally — a row-copy would skip all of that. - It is best-effort + self-healing: entries are delivered redundantly (push and pull), and a periodic table-checksum comparison repairs any drift the journal missed.
Flow at a glance:
8.2 Roles & state — the ha table
A single-row ha table holds the controller's view of the cluster (§14.2):
| Field | Meaning |
|---|---|
enabled |
H/A mode on/off. If off, Journal.add_request is a no-op ("Not in H/A mode") |
master |
is this controller the master? One master at a time owns master-only operations |
insync |
has this controller pulled the full journal and converged? Until true it refuses new work |
sharedip |
the cluster presents a single floating beacon IP; the beacon controller currently owns it |
overrule |
emergency bypass of the insync gate (operator-forced) |
syncimages |
whether OS-image bits are rsynced between controllers |
updated |
timestamp, used for last-writer-wins master election (set_role rejects an older request) |
Related identity (utils/controller.py, utils/ha.py): every controller knows me (its own hostname), whether it is a beacon (owns the shared IP) or a shadow (standby / read-mostly, skipped as a replication source).
8.3 The journal entry
Each mutating base method, when it changes state, calls Journal().add_request(...), writing one row to the journal table:
| Column | Purpose |
|---|---|
function |
Class.method to replay, e.g. Interface.change_node_interface |
object / param |
positional args for the replayed call |
payload |
base64-encoded JSON body for the call |
masteronly / remoteonly |
replay only on the master / only on non-masters |
sendby |
originating controller |
sendto |
a replicator hop (forward via the beacon when sharedip) |
sendfor |
the controller this entry must ultimately be applied on |
tries / created |
retry count + ordering key |
add_request blocks until the controller is insync (up to keeptrying seconds) unless overrule is set, so a freshly-started controller never originates work before it has caught up.
8.4 Replay — handle_requests()
On the receiving side the journal thread reads entries where sendfor = me, ordered by created, and for each:
- drops it if
masteronly/remoteonlydoes not match this controller's role; - decodes the payload and resolves the class dynamically:
repl_class = globals()[class_name]→ e.g.base.node.Node, thenrepl_function = getattr(repl_class, function_name); - calls it with the stored args — the identical code path that ran on the origin;
- for
OSImageoperations (pack/clone_osimage/grab) it additionally queues an image rsync task (method replay alone cannot move image bits — §8.6); - always deletes the journal row afterwards, success or failure (drift is caught later by table hashing, not by infinite retry).
8.5 Convergence loop — journal_mother
The replication thread (one of the elected background workers, §16; it exits immediately if enabled is false) runs roughly every 5 s:
- startup — pull the journal from every peer until successful, then
set_insync(True)and clearoverrule; - push —
pushto_controllers(): POST mysendby=meentries (and forwardsendto=mereplicator entries) to each target's/journal; delete locally only on a successful POST; - pull —
pullfrom_controllers():GET /journal/<me>from each peer (skipping beacon/shadow as appropriate); - verify — periodically compare per-table checksums across controllers (
Tables.verify_tablehashes_controllers); on mismatch, hard-copy the table from the authoritative host (import_table_from_host).
Push and pull together mean a single entry still arrives even if one controller was briefly down.
8.6 OS image synchronisation between controllers
Replaying a method (§8.4) keeps database state identical, but an OS image is also files on disk — a packed imagefile/kernelfile/initrdfile under FILES.IMAGE_FILES, plus an unpacked chroot tree under FILES.IMAGE_DIRECTORY. Those bytes must physically move between controllers. Luna does this as a journalled task chain, not a raw copy, so every controller performs (and records) the same steps.
Trigger. When an OSImage mutation (pack, clone_osimage, grab) replays on a peer, handle_requests calls queue_source_sync / queue_target_sync / queue_source_sync_by_node_name, which enqueue a parked sync_osimage_with_master task (subsystem osimage) carrying <osimage>:<master>.
Execution. tasks_mother (§16.1) picks up sync_osimage_with_master and issues a chain of journalled sub-requests, so the sync is itself replicated consistently:
OsImager.schedule_cleanup— clear the target image area;OSImage.update_osimage(payload withkernelfile/initrdfile/imagefile) — bring DB metadata in line with the master;Downloader.pull_image_files— download the kernel, initrd and image tarball from the master over the authenticated/files/endpoint (Request().download_file) intoFILES.IMAGE_FILES;OsImager.schedule_provision— schedule the provisioning artefacts;- only if
HA().get_syncimages()is true — queueunpack_osimage:<osimage>, which unpacks the tarball into the chroot tree underIMAGE_DIRECTORY. Withsyncimages=falsethe files are pulled but the unpacked filesystem is not replicated.
A second path, Downloader.pull_image_data, syncs the unpacked tree directly between controllers via the filesystem plugin's sync() (rsync -aHv --numeric-ids --one-file-system --delete-after …).
Are plugins relevant? Yes — centrally. The whole osimage/ plugin family abstracts how images are stored, copied and moved, so the sync is filesystem-agnostic:
| Plugin | Selected by | Role in sync |
|---|---|---|
osimage/filesystem/<x> |
PLUGINS.IMAGE_FILESYSTEM (default default) |
performs the actual cross-controller copy — getpath, clone, sync (the rsync), extract (untar into the tree). Override it for ZFS/btrfs snapshotting or versioned read-only image stores |
osimage/operations/image/<distro> |
distribution → osrelease |
how a chroot becomes a bootable image (kernel/initrd/tarball) — the pack/build step |
osimage/operations/osgrab/<…> |
[node, distro, osimage, group] |
grab an image from a running node |
osimage/operations/ospush/<…> |
[node, distro, osimage, group] |
push / live-sync an image to nodes |
boot/provision/<method> |
provision_method |
.create() prepares the downloadable artefact (http file, torrent) that nodes later fetch with the same plugin's fetch (§6.3) |
Because the filesystem and operations plugins are chosen exactly as in §6.4 (by IMAGE_FILESYSTEM, distro/release, or node/group), image storage and replication can be tailored per site and per OS without touching daemon code.
syncimages(in thehatable) is the master switch for replicating image filesystem content; image metadata and the packed files are always kept in step through the journal +/files/download. A manual resync can be triggered withGET /ha/syncimage/<name>.
8.7 Inter-controller API
Controllers talk over the normal REST API using short-lived tokens (utils/request.py → get_token, get_request, post_request, download_file):
| Endpoint | Use |
|---|---|
POST /journal |
receive a batch of journal entries |
GET /journal/<name> |
hand a controller the entries queued for it |
GET /journal/<name>/_delete |
drop applied entries |
GET /ha/state, /ha/master, /ha/controllers |
inspect H/A status |
GET /ha/master/_set, /ha/overrule/_set |
force master / bypass insync |
GET /table/hashes, /table/data/<name> |
checksum compare + hard table copy |
GET /ping |
liveness for verify_pings |
Mental model: the journal is a replicated to-do list of method calls; every controller drains the part addressed to it by re-running the same
basemethod, and a table-checksum sweep is the backstop that guarantees eventual convergence even when individual journal deliveries are lost.
9. base/boot.py — the provisioning brain
The Boot class backs every /boot* endpoint. At import it pre-loads the switchport and cloud detection plugins; in __init__ it builds the boot and osimage plugin trees and resolves controller identity (IP, network, beacon).
| Method | Endpoint | Output |
|---|---|---|
default() |
GET /boot |
templ_boot_ipxe.cfg — the iPXE menu (ask / unassigned / choose / category / enter…) |
boot_short() |
GET /boot/short |
templ_boot_ipxe_short.cfg |
boot_disk() |
GET /boot/disk |
templ_boot_disk.cfg — local-disk boot |
discover_mac(mac) |
GET /boot/search/mac/<mac> |
finds/creates the node for a MAC, renders templ_nodeboot.cfg |
discover_group_mac(group, mac) |
GET /boot/manual/group/… |
picks the next free node in a group |
discover_hostname_mac(host, mac) |
GET /boot/manual/hostname/… |
binds a chosen hostname to a MAC |
install(node[, 'kickstart']) |
GET /boot/install/<node> |
assembles & returns templ_install.cfg |
9.1 What node selection (discover_mac) does
- Looks up the node owning the MAC (via
nodeinterfacejoin). If none, can auto-create/assign one withfind_next_suitable_node()(honouring groupprovision_interface,nextnode,makeupname). - On (re)assignment, binds the MAC to the node's
BOOTIFviaInterface.change_node_interface(journalled for HA) and clears the MAC from any other node (clear_existing_mac). - Resolves IP/gateway (IPv4/IPv6/
dhcp), OS image + tag (node overrides group), kernel options and thenetbootflag — then renderstempl_nodeboot.cfg.
9.2 What install() does
- Re-resolves node, image/tag, kernel options; restarts
dhcp/dhcp6via the queue. - Branches on image type: regular →
templ_install.cfg;kickstart→templ_nodeboot_kickstart.cfg;netboot=false→templ_boot_disk.cfg; honoursverify_bootpause()so a node won't boot while its image is being packed. - Injects all plugin segments (provision
fetch, network init/interface/gateway/dns, bmcconfig, script pre/part/post), sets stateinstall.rendered, returns the assembled template + all render variables. - If any required value is
None, returns a rendered failure template (templ_boot_failed.cfg) with a human-readable reason.
10. The installer template: templ_install.cfg
daemon/templates/templ_install.cfg is the bash script that runs inside the node's initramfs (dracut) to install the OS image. It is delivered by GET /boot/install/<node> after the daemon (a) injects the plugin segments and (b) Jinja2-renders the node-specific variables.
10.1 How it is rendered
routes/boot.py:boot_install(token-protected) callsBoot().install(node).base/boot.py:install()readstempl_install.cfg, injects plugin segments (§6.3), sets node stateinstall.rendered, returnsdata['template_data'].- The route renders it with
render_template_string(...), passing ~30LUNA_*/NODE_*/PROVISION_*variables.
10.2 Key rendered variables
| Variable | Source / meaning |
|---|---|
NODE_NAME, NODE_HOSTNAME |
resolved node identity |
LUNA_GROUP, LUNA_DISTRIBUTION, LUNA_OSRELEASE |
group + OS image metadata |
LUNA_IMAGEFILE |
image tarball to fetch and unpack |
LUNA_SYSTEMROOT |
target root (e.g. sysroot) |
PROVISION_METHOD / PROVISION_FALLBACK |
primary + backup provision plugin names |
LUNA_INTERFACES |
per-interface config (rendered by network plugin) |
LUNA_PRESCRIPT/PARTSCRIPT/POSTSCRIPT |
base64 user scripts |
LUNA_ROLES / LUNA_SCRIPTS |
comma-lists fetched at install time |
LUNA_SETUPBMC / LUNA_BMC |
BMC config block (optional) |
LUNA_TOKEN |
JWT used by the installer to call back to the API |
LUNA_API_PROTOCOL/PORT, WEBSERVER_* |
controller endpoints |
10.3 Execution order (bottom of the script)
lunainit # create /lunatmp and /sysroot
dynamic_ip_check # if DHCP boot, POST real IP back to /config/node/<name>
node_scripts # (if LUNA_SCRIPTS) fetch pre/part/post script bodies
prescript # run pre script + custom 'pre' scripts
bmcsetup # (if LUNA_SETUPBMC) configure BMC channel/credentials
partscript # partition / filesystem creation
download_image # PROVISION_METHOD then PROVISION_FALLBACK (provision plugin 'fetch')
unpack_imagefile # tar -I lbzip2 extract image into /sysroot (ACL-aware)
collect_mac_n_name_net # map MACs <-> interface names
change_net # write NetworkManager connections (network plugin segments)
node_secrets # GET /config/secrets/node/<name> -> write secret files (chmod 600)
postscript # run post script + custom 'post' scripts
node_roles # GET /boot/roles/<role> -> install + enable luna-<role>.service
fix_capabilities # restore file capabilities (ping, arping…)
restore_selinux_context # setfiles relabel if SELinux policy present
update_system_info # dmidecode vendor/serial -> POST /config/node/<name>
cleanup # wipe /sysroot/tmp
update_status "install.success"
Throughout, update_status "<state>" POSTs to /monitor/node/<name> so the controller (and luna2-cli) show live progress: install.prescript, install.download, install.unpack, install.setnet, install.secrets, install.roles, install.success, etc.
11. How a node typically boots (end to end)
A re-provision is just this sequence again. A node configured with
netboot=falseshort-circuits at Stage 2/4 totempl_boot_disk.cfgand boots its local disk instead of re-installing.
12. luna2-client — the node-side pre-pivot hook
luna2-daemon produces and serves boot artefacts; luna2-client is what actually runs them on the node. It is the node-side half of the boot story, shipped from a separate GitLab repo (clustervision/luna2-client, branch main). It is not a Python service — it is a dracut module named 95luna baked into every OS image as an OS package.
12.1 Packaging (deb / rpm)
One package per distro-family × architecture; all carry the same payload:
| Path | Distro family | Arch | Builder |
|---|---|---|---|
redhat/x86, redhat/arm |
RHEL / Rocky / Alma | x86_64 / aarch64 | create_rpm.sh + luna2-client.spec |
opensuse/x86, opensuse/arm |
openSUSE / SLES | x86_64 / aarch64 | create_rpm.sh + luna2-client.spec |
ubuntu/x86, ubuntu/arm |
Ubuntu / Debian | x86_64 / aarch64 | create_deb.sh |
The package delivers the 95luna dracut module under /usr/lib/dracut/modules.d/95luna/, an /etc/dracut.conf.d/luna2.conf that pulls extra tools into the initramfs, a /usr/sbin/dhclient-script-luna, and libluna-fakeuname.so. The spec/control file pulls in the runtime tools the installer needs (curl, aria2, tar, lbzip2, parted, gdisk, lvm2, mdadm, ipmitool, tpm2-tools, dropbear, dmidecode, jq…).
How it reaches the node: luna2-client package → installed into the OS image →
dracutbakes the95lunamodule into that image's initrd → the daemon serves that initrd asOSIMAGE_INITRDFILEintempl_nodeboot.cfg. So the initrd a node netboots already knows how to talk toluna2-daemon.
12.2 The two dracut hooks (where the pre-pivot start is wired in)
module-setup.sh registers two hooks and bakes node-side credentials/keys into the initramfs:
inst_hook cmdline 99 luna-parse-cmdline.sh— runs in dracut's cmdline phase. Its whole job is[ $root = "luna" ] && rootok=1, telling dracut the syntheticroot=luna(set bytempl_nodeboot.cfg) is valid so it won't hang waiting for a real root device.inst_hook initqueue/finished 99 luna2-start.sh— runs at the end of the initqueue, once the network device has settled but before dracut pivots (switch_root) into the real root. This is the initial pre-pivot luna start.
It also bakes in: the node-side client config /trinity/local/luna/node/config/luna.ini (holds API_USERNAME/API_PASSWORD), the host SSH keys + /root/.ssh/authorized_keys (so an operator can SSH/dropbear into a node during install), and the CA bundle.
12.3 luna2-start.sh — what the pre-pivot hook does
Guarded by if [ "x$root" = "xluna" ], so it only fires on Luna-provisioned boots:
- luna_init / luna_start — set hostname from
luna.hostname; start dropbear (orsshd) and a rescue shell ontty2for live debugging; matchluna.macto a NIC (find_nic) and bring networking up — DHCP (luna.bootproto=dhcp) or staticluna.ip+luna.gw. ExposesLUNA_BOOTIF/LUNA_BOOTPROTOfor the installer. - fetch_token — read
API_USERNAME/API_PASSWORDfrom the baked-inluna.ini, read the TPM PCR (tpm2_pcrread sha256:0), andPOST {tpm_sha256,username,password}to${luna.url}/tpm/${luna.node}. The daemon (base/authentication.py) matches or first-boot-registers the node'stpm_sha256and returns a JWT. This is how a node authenticates itself before it owns any secrets. - fetch install script — loop
GET ${luna.url}/boot/install/${luna.node}withx-access-tokens: $token, retrying every 10 s until HTTP 200. That body is the daemon-assembledtempl_install.cfg(§10). - run it —
/bin/sh /lunatmp/install.sh. On success the loop ends; on failure the whole token→fetch→run loop repeats, forwarding/tmp/luna_install.logtoluna.loghostvialogger. - service mode — if
luna.service=1, skip install entirely and drop to an interactive shell (rescue / manual work). - luna_finish — forward logs, kill
sshd/dropbear/dhclient, flush + down the NIC, move the install log to/sysroot/var/log/, kill thetty2shell. Control returns to dracut, whichswitch_roots into the freshly installed/sysroot— the pivot.
12.4 Contract between the two halves
The daemon side (base/boot.py → templ_install.cfg) only produces the install script; it relies on luna2-client to (a) accept root=luna, (b) bring up the right NIC by MAC, (c) authenticate via /tpm/<node>, and (d) fetch and execute the script before pivot. The kernel arguments that glue them together are emitted by templ_nodeboot.cfg (rendered by Boot().discover_mac) and consumed by luna2-start.sh:
| Kernel arg | Set by (daemon) | Used by (luna2-client) |
|---|---|---|
root=luna / boot=ramdisk |
templ_nodeboot.cfg |
luna-parse-cmdline.sh (rootok=1) |
luna.url |
controller API endpoint | /tpm/<node> + /boot/install/<node> calls |
luna.node / luna.hostname |
resolved node | node identity, hostname |
luna.mac |
node BOOTIF MAC | find_nic → bring up NIC |
luna.ip / luna.gw / luna.bootproto |
node IP config | static vs DHCP network setup |
luna.loghost |
controller / beacon | remote log forwarding via logger |
luna.service |
node service flag | 1 → service/rescue shell, skip install |
luna.verifycert |
API verify-certificate setting | --insecure toggle on curl |
luna2-clientis intentionally OS-image-resident and arch/distro-specific (note themain-arm,*-opensuse,twans_newbranches). When adding support for a new distro you typically pair a newboot/network/<distro>.pyplugin on the daemon (§6.4) with a matching luna2-client build so the baked-in initrd has the right tools.
13. Worked examples — request walkthroughs
These tie the layers together. Inline markers (rendered faintly) show where each subsystem is incorporated:
(HA)replication touchpoint ·(PLUGIN)a plugin is selected & loaded ·(QUEUE)handed to a background mother (§16.1) ·(SVC)service config re-rendered from a template. Status/return behaviour follows §15.
13.1 luna osimage pack → GET /config/osimage/<name>/_pack
Packs a chroot image tree into a bootable kernel / initrd / tarball.
- Route
config_osimage_pack(@token_required).(HA)checksHA().get_hastate()/get_role(): - not master →
Journal().add_request("OSImage.pack", object=name, masteronly=True, misc=request_id)and returns arequest_id— the master does the packing; the CLI polls status by id.(HA) - master (or non-HA) → continue locally.
OSImage().pack(name)clears thechangedflag, then(QUEUE)add_task_to_queue("pack_n_build_osimage", subsystem="osimage"); if first in line it wakesosimage_mother_wrapper()in a child process.(QUEUE)osimage_tasks_mother/ the child runspack_osimage:(PLUGIN)osimage/filesystem/<IMAGE_FILESYSTEM>→getpath/mount the tree (override for ZFS/btrfs);(PLUGIN)osimage/operations/image/<distribution>·<osrelease>→ the distro-specific pack (kernel, initrd, tarball intoIMAGE_FILES).- Progress streams to
/monitor/ thestatustable underrequest_id; the CLI tails it (§15). (HA)on success the route callsJournal().queue_source_sync(name, request_id)→ async_osimage_with_mastertask so every peer pulls the new files (and unpacks them ifsyncimages, §8.6).
Plugins: filesystem + per-distro image operation. HA: master-routing up front, file replication afterwards.
13.2 luna node show → GET /config/node/<name>
The read path — deliberately simple, no side effects.
- Route
config_node_get(@token_required @validate_name) →Node().get_node(name). - base assembles the dataset:
Database().get_record_joinovernode ⋈ group ⋈ nodeinterface …, applying node-overrides-group inheritance (osimage, kernel options, bmc, scripts, netboot). - Returns JSON; the CLI renders the table.
Plugins: none. HA: none for the read — it is served straight from this controller's local replica, which the journal keeps in step with its peers, so any controller answers identically.
13.3 luna node change → POST /config/node/<name>
- Route
config_node_post(@provision_token_required— so in-install scripts may also update —@validate_name,@input_filter(checks=['config:node'])). (HA)Journal().add_request("Node.update_node", object=name, payload=request.data)first — this records the entry for replication and gates oninsync(a not-yet-synced controller refuses the write). If it returns true:Node().update_node(name, data)runs locally: validates and writes the node/interface rows, then:(QUEUE)enqueuesrestart dhcp,restart dhcp6,reload dns, and arun_bulk node:mastertask;(PLUGIN)loadshooks/config/nodeand calls the node hook so sites can react to the change.(HA)on every peer,handle_requestsreplaysNode.update_node(name, payload)— the same method, queue calls and hook run there too, so all controllers converge.
Plugins: hooks/config/node. HA: journalled write + identical replay; (QUEUE) decouples the dhcp/dns regeneration.
13.4 luna network change → POST /config/network/<name>
- Route
config_network_post(@token_required @validate_name @input_filter(checks=['config:network'])). (HA)Journal().add_request("Network.update_network", object=name, payload=request.data)→ if accepted:Network().update_network(name, data): validates DHCP ranges and the mutually-exclusive DHCP modes, writes the network row, may re-allocate IPs(QUEUE), then:(SVC)Service().queue('dns','reload'),Service().queue('dhcp','restart'),Service().queue('dhcp6','restart')→ the DHCP/DNS configs are re-rendered from thetempl_kea-dhcp4/6.cfgandtempl_dns_*templates and the services bounced.(HA)replayed on every peer, so each controller re-renders its own Kea/named config and restarts its own dhcp/dns — there is no shared config file.
Plugins: none directly (templates, not plugins, drive dhcp/dns output). HA: journalled write + per-controller service regeneration.
13.5 luna control / lpower — power one node, and a bunch (threaded)
Two endpoints, one plugin path. Power is a live action, so — unlike §13.3/12.4 — it is not journalled.
One node (synchronous) — lpower n1 on → GET /control/action/power/n1/_on:
- Route
control_action_get(@token_required) →Control().control_action("n1","power","on")— blocks until done. - base joins
node ⋈ BMC interface ⋈ bmcsetup(node'sbmcsetupid, else the group's) to get the BMC IP + credentials. (PLUGIN)NodeControl.control_action→plugin_load("control", [nodename, group])selects the power backend (defaultipmitool, ordell) node→group→default →control_plugin().power_on(...)→(True, "…Up/On").(PLUGIN)plugin_load("hooks/control", [nodename, group])runs the optional site hook; on successMonitor().update_nodestatus.- returns
(status, {"control": {"power": …}})→ HTTP 204 / 200 / 501. No(HA)— it runs on whichever controller received it.
A bunch (threaded, asynchronous) — lpower node[001-064] on → POST /control/action/power/_on (body lists the nodes):
- Route
control_action_post(@token_required @input_filter(checks=['control'])) →Control().bulk_action(data). - base builds a pipeline of nodes, reads
BMC_BATCH_SIZE/BMC_BATCH_DELAY(theBMCCONTROLconfig, §17.1), generates arequest_id, and(QUEUE)spawnsNodeControl().control_mother(pipeline, request_id, size, delay)in a thread — then returns therequest_idimmediately (§15). control_motherloops while the pipeline has nodes: aThreadPoolExecutor(max_workers=10)firescontrol_child×batch, each popping a node and running the same(PLUGIN)control/[node,group]path; per-node results (n012:power on:True) are appended to thestatustable; it sleepsdelaybetween batches to spare the BMCs, then writesEOF.- the CLI polls
GET /control/status/<request_id>and streams per-node outcomes untilEOF(§15).
Plugins: control + hooks/control, selected per node→group→default. HA: none (live action). Scale: BMC_BATCH_SIZE/BMC_BATCH_DELAY throttle how many BMCs are hit at once.
14. Startup — first boot, bootstrap & schema
What the daemon does the first time it starts (and re-checks on every start): it validates or seeds the cluster from a bootstrap file, against a declarative database schema. validate_bootstrap() is invoked from on_starting (§16) before any worker forks.
14.1 bootstrap.py — first-run cluster bootstrap
validate_bootstrap() is called by luna.py:on_starting before any worker forks (§16). It:
- checks the database is reachable (
db_status()detects the driver — SQLite / MySQL / PostgreSQL) and whether the schema already exists (DBStructure().check_db_tables()); - if
/trinity/local/luna/daemon/config/bootstrap.iniis present and the tables are empty, parses it and seeds the initial cluster —HOSTS(hostname, controller, nodelist, domain),NETWORKS(cluster/ipmi/ib),GROUPS,OSIMAGE,BMCSETUP; - runs post-start fixes regardless:
verify_and_set_beacon(),legacy_and_forward_fixes(),cleanup_queue_and_status(),cleanup_and_init_ping().
Returns False (daemon exits) if the DB is unavailable, or if the schema needs init but bootstrap.ini is missing. After a successful bootstrap the operator removes bootstrap.ini.
14.2 database_layout.py — declarative schema
A pure-data module: ~30 DATABASE_LAYOUT_<table> lists, each describing a table's columns/types/keys. utils/dbstructure.py reads these to create and upgrade tables (driven by the bootstrap table check, §14.1). Adding a column = editing the layout list; there are no separate migration scripts.
Tables defined: node, group, network, nodeinterface, groupinterface, ipaddress, reservedipaddress, osimage, osimagetag, bmcsetup, switch, otherdevices, rack, rackinventory, cloud, controller, cluster, dns, roles, nodesecrets, groupsecrets, user, monitor, status, queue, tracker, journal, ha, ping, reference.
15. The return contract — how status & messages travel
Every layer of luna2-daemon speaks the same shape: a method returns (status, message_or_data) — a boolean plus a string or dict — and, for work that outlives the HTTP request, a third element request_id. This single convention is what lets a message born in the deepest place (a plugin, a child thread on a peer controller) surface verbatim at the CLI, with no per-endpoint plumbing.
The tuple, layer by layer (a power-on, deepest → outermost):
control plugin power_on() -> (True, "Chassis Power Control: Up/On") (PLUGIN)
utils NodeControl.control_action() -> (True, "Chassis Power Control: Up/On")
base Control.control_action() -> (True, {"control": {"power": "...Up/On"}})
route control_action_get() -> json body + HTTP 204 / 200 / 501
CLI lpower n1 on -> prints the message
base returns either a 2-tuple (status, response) or, for deferred work, a 3-tuple (status, response, request_id). The route turns status into an HTTP code — often through Helper().get_access_code(status, response) — and JSON-encodes response.
Two channels:
- Synchronous — the
(status, message)tuple bubbles straight up and is the HTTP body. Used for reads and quick mutations (node show,node change,network change). - Asynchronous status stream — for queued/threaded/long work (
osimage pack, bulklpower), the base returns immediately with(True, response, request_id)while the deep code keeps appending progress to thestatustable:
Status().add_message(request_id, "lpower", "node005:power on:True", status=200)
…
Status().add_message(request_id, "lpower", "EOF") # terminator
The CLI then polls GET /control/status/<request_id> (or /status/<request_id>); the daemon replays the accumulated messages (deleting them as they are read) until it sees EOF. This is how a line emitted by a child thread — or by a plugin running inside osimage_mother — reaches the user after the original request already returned.
Across HA. When a request was journalled and replayed on a peer (§8.4), handle_requests captures the replayed method's (status, message, request_id) and calls Status().forward_status_request(...), so progress produced on the master is forwarded back to the controller the user actually talked to. The return contract therefore spans controllers, not just layers.
Why it matters: because every boundary — plugin, utils, base, route — returns the same
(status, message[, request_id]), any layer can answer directly or defer to the stream, and an error string from the lowest plugin reaches the CLI unaltered. New endpoints inherit this for free.
16. Process lifecycle & background workers
Gunicorn drives the daemon through hooks defined in luna.py and wired in gunicorn.py:
| Hook | When | What luna.py does |
|---|---|---|
on_starting |
master, before fork | validate bootstrap; enqueue initial dhcp/dhcp6/dns (re)starts; fire hooks/luna startup plugin |
post_worker_init |
each worker, after fork | elect one worker (non-blocking flock on /var/lib/luna2-daemon-background.lock) to run the singleton background threads |
worker_exit |
worker exits | stop that worker's background threads |
on_reload |
service reload | drain the queue |
on_exit |
service stop | fire hooks/luna shutdown plugin |
worker_abort |
worker timeout | dump a traceback for debugging |
The elected worker starts these Housekeeper / PluginSync "mother" threads (start_background_workers()):
- status-message cleanup (cleanup_mother)
- queue housekeeper / task drain (tasks_mother)
- switch/port/MAC detection (switchport_scan)
- boot-plugin sync between HA controllers (boot_plugins_mother)
- journal / replication (journal_mother) — see §8
- invalid-config sweep (invalid_config_mother)
- osimage tasks (osimage_tasks_mother)
16.1 The background "mothers"
utils/housekeeper.py (Housekeeper) defines most of these singleton loops; each *_mother runs while True guarded by the shared stop event, and publishes its health to /monitor (the mother item, 200/500). They run only on the flock-elected worker and, where relevant, only while the controller is HA-insync and/or master.
| Thread | Cadence | Responsibility |
|---|---|---|
tasks_mother |
drains the housekeeper queue continuously |
the workhorse dispatcher — a match over queued tasks: service restart/reload (dhcp, dhcp6, dns), sync_osimage_with_master (§8.6), provision_osimage, unpack_osimage, remote osimage removal, etc. |
osimage_tasks_mother |
periodic, master-only | waits for insync; if not master, clears queued osimage tasks and returns; expires stale tasks; spawns the heavy image worker (OsImage().osimage_mother_wrapper) in a separate process to run pack/build/grab/push/copy |
cleanup_mother |
periodic | expires old status / message buffers so the status table doesn't grow unbounded |
switchport_scan |
periodic | loads the switchport detection plugin and scans configured switches to learn which MAC sits on which port — feeds node identification at boot (§9.1) |
invalid_config_mother |
periodic | sweeps for incomplete/invalid configuration (e.g. nodes/interfaces missing required data) and flags it |
journal_mother |
~5 s | HA replication loop (§8.5) |
boot_plugins_mother |
periodic | lives in utils/plugin_sync.py (PluginSync), not housekeeper — replicates boot-plugin files between HA controllers |
Each loop is defensive: exceptions are caught, logged with line numbers, and surfaced as a 500 on the mother monitor item; on the next clean pass it flips back to 200. The heavy osimage work is deliberately pushed to a child process (not just a thread) so a crash or signal there can't take down the worker.
17. The common package — constants & validation
daemon/common/ is the foundation imported by almost every other module. It is not a request-handling layer; it provides process-wide configuration and the auth/input decorators. (The first-run bootstrap and the database schema — also part of common/ — are covered separately under Startup, §14.)
17.1 constant.py — CONSTANT, LOGGER, LUNAKEY
Imported once at process start; reads /trinity/local/luna/daemon/config/luna.ini into the global CONSTANT dict and exposes the shared LOGGER and LUNAKEY. Everything else simply does from common.constant import CONSTANT, LOGGER.
- Parses the ini into a fixed section schema (below) and sanity-checks that the log dir, image dir, template dirs, plugin dir and keyfile exist and are writable.
- Normalises human durations to seconds —
EXPIRY(h/m/s, default 24h),COOLDOWN,MAXPACKAGINGTIME,BMC_BATCH_DELAY. - Loads
LUNAKEYfromFILES.KEYFILE— the symmetric key used to encrypt/decrypt node & group secrets. - Copies the template files listed in
TEMPLATES.TEMPLATE_LIST(a JSON manifest) into the workingTMP_DIRECTORY.
| CONSTANT section | Key settings |
|---|---|
LOGGER |
LEVEL, LOGFILE |
API |
USERNAME, PASSWORD, EXPIRY, SECRET_KEY, PROTOCOL, ENDPOINT |
DATABASE |
DRIVER, DATABASE, DBUSER, DBPASSWORD, HOST, PORT |
FILES |
KEYFILE, IMAGE_FILES, IMAGE_DIRECTORY, TMP_DIRECTORY, MAXPACKAGINGTIME |
PLUGINS |
PLUGINS_DIRECTORY, IMAGE_FILESYSTEM |
SERVICES |
DHCP, DNS, CONTROL, COOLDOWN, COMMAND |
DHCP |
OMAPIKEY |
BMCCONTROL |
BMC_BATCH_SIZE, BMC_BATCH_DELAY |
TEMPLATES |
TEMPLATE_FILES, TEMPLATE_LIST, TMP_DIRECTORY, VARS |
CONSTANTis read at import only — editingluna.inirequires a daemon restart to take effect.
17.2 validate_auth.py — token decorators
JWT (HS256, signed with API.SECRET_KEY) verification wrappers used by the routes:
| Decorator | Use |
|---|---|
@token_required |
standard protected endpoints; rejects a provision-scoped token (403) — those may only reach install endpoints |
@provision_token_required |
endpoints the in-install kickstart must reach; accepts an admin token or a node-scoped provision token whose node claim matches the requested node/name |
@agent_check |
sets cli=True/False from the User-Agent (Luna2-web ⇒ GUI, else CLI) |
This is the JWT consumed by luna2-client after it authenticates at /tpm/<node> (§12.3).
17.3 validate_input.py — boundary input sanitation
The security layer applied at the HTTP edge (the "validate at system boundaries" rule). It holds a dictionary of named regex patterns and the decorators that enforce them:
@validate_name— sanitises thename/ path params on a route.input_filter(checks=[...])— validates named JSON-body fields before the handler runs; returns 400 on mismatch.- strips control characters; named patterns include
name,strictname,interface,ipaddress,macaddress,domainname,integer,loosecsv,anything, etc.
18. Quick reference
| I want to… | Look at |
|---|---|
| Add a new download method | plugins/boot/provision/<name>.py (define fetch, create, cleanup) |
| Support a new distro's networking | plugins/boot/network/<distro>.py (init/interface/gateway/dns) plus a matching luna2-client build (§12) |
| Change the node-side pre-pivot behaviour | luna2-client repo → */src/usr/lib/dracut/modules.d/95luna/luna2-start.sh |
| Add a tool to the install initramfs | luna2-client module-setup.sh (dracut_install …) + etc/dracut.conf.d/luna2.conf |
| Change the iPXE menu | templates/templ_boot_ipxe.cfg + Boot().default() |
| Change the installer steps | templates/templ_install.cfg (order at bottom) + matching ## CODE SEGMENT markers |
| Add an API resource | new routes/<x>.py blueprint + register in luna.py + base/<x>.py class |
| Add a power backend | plugins/control/<vendor>.py |
| Hook startup/shutdown behaviour | plugins/hooks/luna/default.py |
| Run logic on every Nth worker only | start_background_workers() in luna.py (flock-elected) |
Generated from the luna2-daemon development branch. File and method references are accurate as of HEAD 371511ce.