Re: [RFC v1 1/3] luo: Live Update Orchestrator
From: Pasha Tatashin
Date: Thu Mar 27 2025 - 15:30:09 EST
On Thu, Mar 20, 2025 at 3:26 PM Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:
>
> On Thu, Mar 20, 2025 at 03:00:31PM -0400, Pasha Tatashin wrote:
>
> > > I also think we should give up on the sysfs. If fdbox is going forward
> > > in a char dev direction then I think we should have two char devs
> > > /dev/kho/serialize and /dev/kho/deserialize and run the whole thing
> >
> > KHO is a mechanism to preserve kernel memory across reboots. It can be
> > used independently of live update, for example, to preserve kexec
> > reboot telemetry, traces, and for other purposes. The LUO utilizes KHO
> > for memory preservation but also orchestrates specifically a live
> > update process, provides a generic way for subsystems and devices to
> > participate, handles error recovery, unclaimed devices, and other live
> > update-specific steps.
> >
> > That said, I can transition the LUO interface from sysfs to a character device.
>
> Sure, I mean pick whatever name makes sense for this whole bundle..
>
> > > through that. The concepts shown in the fdbox patches should be merged
> > > into the kho/serialize char dev as just a general architecture of open
> > > the char dev, put stuff into it, then finalize and do the kexec.
> >
> > Some participating subsystems, such as interrupts, do not have a way
> > to export a file descriptor.
>
> Interrupts that need to be preserved are owned by VFIO. Why do we need
> to preserve interrupts? I thought the model was to halt all interrupts
> and then re-inject a spurious one?
>
> > It is unclear why we would require this
> > for kernel-internal state that needs to be preserved for live update,
> > which should instead register with internally.
>
> Because there is almost no kernel state which is machine global and
> unconditionally should be included. eg Interrupts for devices that are
> not doing preservation should not be serialized. Only userspace knows
> what should be preserved so you must always need a mechanism to tell
> the kernel.
>
> > IMO, the current API and state machine are quite simple (I plan to
> > present and go through them at one of the Hypervisor Live Update
> > meetings). However, I am open to changing to a different API, and we
> > can expose it through a character device.
>
> Everything seems simple before you actually try to use it :)
>
> > > Also agree with Greg, I think this needs more thoughtful patch staging
> > > with actual complete solutions. I think focusing on a progression of
> > > demonstrable kexec preservation:
> > > - A simple KVM and the VM's backing memory in a memfd is perserved
> > > - A simple vfio-noiommu doing DMA to a preserved memfd, including not
> > > resetting the device (but with no iommu driver)
> > > - iommufd
> >
> > We are working on this. However, each component builds upon the
> > previous one, so it makes sense to discuss the lower layers early to
> > get early feedback.
>
Hi Jason,
Thanks for your thoughts. I agree with your observation about
components being worked on separately when they might be intrinsically
linked. Especially, given that kvm/vfio/iommu all have FD counterparts
to the global states, or device state.
> I think part of the problem is there are lots of people working on
> pieces as though they are seperate components, and I'm not sure this
> is entirely wise, or the components are actually seperate. I see
> fdbox and this luo patch series as effectively being the same
> component, just different aspects of it.
You've articulated precisely the point we discussed at LSF/MM. Based
on that conversation, the next proposal will focus on unifying FDBox
and the Live Update Orchestrator into a single, cohesive component.
Here’s a summary of the planned approach:
1. Unified Location: LUO will be moved under misc/liveupdate/ to house
the consolidated functionality.
2. User Interfaces: A primary character device (/dev/liveupdate)
utilizing an ioctl interface for control operations. (An initial draft
of this interface is available here:
https://raw.githubusercontent.com/soleen/linux/refs/heads/luo/rfc-v2.1/include/uapi/linux/liveupdate.h)
An optional sysfs interface will allow userspace applications to
monitor the LUO's state and react appropriately. e.g. allows SystemD
to load different services during different live update states.
3. Dependency Management: The viability of preserving a specific
resource (file, device) will be checked when it initially requests
participation.
However, the actual dependencies will only be pulled and the final
ordered list assembled during the prepare phase. This avoids the churn
of repeatedly adding/removing dependencies as individual components
register.
To manage the preservation logic, we'll use specific handles
categorized into three types: fd, device, and global. Each handle type
will define callbacks for the different phases of the live update
process. For instance, a file-system-related handle might look
something like this:
struct liveupdate_fs_handle {
struct list_head liveupdate_entry;
int (*prepare)(struct file *filp, void *preserve_page, ...); //
Callback during prepare phase
int (*reboot)(struct file *filp, void *preserve_page,...); //
Callback during reboot phase
void (*finish)(struct file *filp, void *preserve_page,...); //
Callback after successful update to do state clean-up
void (*cancel)(struct file *filp, void *preserve_page,...); //
Callback if prepare/reboot is cancelled
};
The overall preservation sequence involve processing these handles in
a specific order:
Preserved File Descriptors (e.g., memfd, kvmfd, iommufd, vfiofd)
Preserved Devices (ordered appropriately, leaves-to-root)
Global State Components
Let me know if this direction aligns with your expectations.
Pasha