No, U dev
This week in Liminix: also, last week and for several weeks preceding, it has been all about the “device database”.
To recap, we wish to run certain services only under particular conditions. The particular use case I have here is my backup server which is a GL.iNet travel router that runs rysncd, with a USB disk plugged into it. (There’s historical resonance here: a lightweight backup server was the original reason I started writing NixWRT). The system shouldn’t try to mount the external drive unless it’s plugged in - if it starts the mount service at boot, the service startup will hang, and that means the machine can’t be cleanly rebooted - or any other change made to the running services - until the disk is attached. Ugly.
The disk is present when there’s a node in sysfs with a uevent
file
containing the attributes DEVTYPE=partition
and
PARTNAME=backup-disk
.
# cat /sys/class/block/sda1/uevent
MAJOR=8
MINOR=1
DEVNAME=sda1
DEVTYPE=partition
DISKSEQ=7
PARTN=1
PARTNAME=backup-disk
but we don’t know where under /sys
the file is: the kernel allocates
sdX
devices as it sees them, so it might depend on how many other
storage devices are plugged in.
The naive solution (don’t do this) would be to recursively walk the
whole of /sys
every few seconds. Thankfully we don’t have to,
because the kernel sends “uevent” netlink messages whenever anyhing
changes. So we built “devout”, a service that maintains a model of
all the hardware by listening to these messages
and updating a database
(using the term in its loosest sense) of the state. Then we can have
a client
(or many clients) connect to the database service and say “send me all
the events matching some critiera I am interested in”. Devout will send it
the messages for all matching devices it knows about at connection
time, then relay further relevant netlink messages to it until it
disconnects again.
The client then starts its controlled service when it gets “add” or “change” events and stops it again on “remove”.
“Why go to all this trouble when udev already exists?” It’s a fair question, and I keep asking it myself as well.
The short version is that udev rules afford a level of generality which is (so far, for our purposes) unnecessary, and easy to get wrong, and hard (for me, at least) to reason about, because it allows arbitrary commands on each event and it doesn’t have symmetry - it’s hard to be confident that all the changes to the system which were introduced by some “add” rule are undone correctly when the device is removed. The udev rule language has jumps and conditions which means it’s not simple to know what a rule will do when considered in isolation.
Another consideration is that I am hoping this general pattern - a trigger service wich subscribes to events from another service - will be applicable for other event sources - e.g. SNMP, or rtnetlink messages, or collectd, or (fill in your appplication here).
Levitate me
Where we are right now, though, is that I have reinstalled my backup server, and this time I have enabled it for levitate: the mechanism for rebooting into a maintenance system that I implemented last year. And promptly discovered that for it to to be any actual use, it needed some rejiggering: the tl;dr is that levitate now expects to be passed a whole config fragment not just a list of services. So maybe now you can use levitate (and maybe I should write some documentation for it)
Failover is the mother of success
Next thing on the list is a mechanism for failover to a secondary WAN connection when the primary link goes down.