Re: [PATCH v6 02/20] liveupdate: luo_core: integrate with KHO

From: Mike Rapoport
Date: Mon Nov 17 2025 - 16:06:00 EST


On Mon, Nov 17, 2025 at 01:29:47PM -0500, Pasha Tatashin wrote:
> On Sun, Nov 16, 2025 at 2:16 PM Mike Rapoport <rppt@xxxxxxxxxx> wrote:
> >
> > On Sun, Nov 16, 2025 at 09:55:30AM -0500, Pasha Tatashin wrote:
> > > On Sun, Nov 16, 2025 at 7:43 AM Mike Rapoport <rppt@xxxxxxxxxx> wrote:
> > > >
> > > > > +static int __init liveupdate_early_init(void)
> > > > > +{
> > > > > + int err;
> > > > > +
> > > > > + err = luo_early_startup();
> > > > > + if (err) {
> > > > > + pr_err("The incoming tree failed to initialize properly [%pe], disabling live update\n",
> > > > > + ERR_PTR(err));
> > > >
> > > > How do we report this to the userspace?
> > > > I think the decision what to do in this case belongs there. Even if it's
> > > > down to choosing between plain kexec and full reboot, it's still a policy
> > > > that should be implemented in userspace.
> > >
> > > I agree that policy belongs in userspace, and that is how we designed
> > > it. In this specific failure case (ABI mismatch or corrupt FDT), the
> > > preserved state is unrecoverable by the kernel. We cannot parse the
> > > incoming data, so we cannot offer it to userspace.
> > >
> > > We report this state by not registering the /dev/liveupdate device.
> > > When the userspace agent attempts to initialize, it receives ENOENT.
> > > At that point, the agent exercises its policy:
> > >
> > > - Check dmesg for the specific error and report the failure to the
> > > fleet control plane.
> >
> > Hmm, this is not nice. I think we still should register /dev/liveupdate and
> > let userspace discover this error via /dev/liveupdate ABIs.
>
> Not registering the device is the correct approach here for two reasons:
>
> 1. This follows the standard Linux driver pattern. If a driver fails
> to initialize its underlying resources (hardware, firmware, or in this
> case, the incoming FDT), it does not register a character device.
> 2. Registering a "zombie" device that exists solely to return errors
> adds significant complexity. We would need to introduce a specific
> "broken" state to the state machine and add checks to IOCTLs to reject
> commands with a specific error code.

You can avoid that complexity if you register the device with a different
fops, but that's technicality.

Your point about treating the incoming FDT as an underlying resource that
failed to initialize makes sense, but nevertheless userspace needs a
reliable way to detect it and parsing dmesg is not something we should rely
on.

> Pasha

--
Sincerely yours,
Mike.