Re: [PATCH v1 18/19] Documentation/livepatch: Update documentation for state, callbacks, and shadow variables

From: Joe Lawrence
Date: Fri Mar 07 2025 - 10:51:09 EST


On 3/7/25 07:26, Petr Mladek wrote:
> On Thu 2025-03-06 17:54:41, Joe Lawrence wrote:
>>
>> With that in mind, a livepatch state could be thought of as an
>> indication of "a context needing special handling in a (versioned) way".
>
> I am not sure about the word "context". But it might be because I am
> not a native speaker.
>
> In my view, a state represents a side effect, made by loading
> the livepatch module (enabling livepatch), which would need a special
> handling when the livepatch module gets disabled or replaced.
>

Ok, yeah "side effect" works well here.

> There are currently two types of these side effects:
>
> + Changes made by callbacks which have to get reverted when
> the livepatch gets disabled.
>
> + Shadow variables which have to be freed when the livepatch
> gets disabled.
>

Right, these are livepatch-native side effects as we're providing the
API to introduce them. Nothing would stop a livepatch author from using
states to handle their own interesting side effects though. (I'm trying
to initially think of states in the abstract and then callbacks/shadow
variables are just implementations that tie together in the livepatching
core.)

> The states API was added to handle compatibility between more
> livepatches. I primary had the atomic replace mode in mind.
>
> Changes made by callbacks, and shadow variables added, by
> the current livepatch usually should get preserved during
> the atomic replace.
>
>> Livepatches claiming to deal with a particular state, needed to do so
>> with its latest or current versioning.
>
> The original approach was affected by the behavior of per-object callbacks
> during the atomic replace. Note that:
>
> Only per-object callbacks from the _new_ livepatch were
> called during the atomic replace.
>

Ah right ...

> As a result, previously, new livepatch, using the atomic replace, had to be
> able to revert changes made by the older livepatches when it did
> not want to keep them.
>
> This shows nicely that the original semantic was broken! There was
> no obvious way how to revert obsolete states using the atomic
> replace.
>
> The new livepatch needed to provide callbacks which were able to
> revert the obsoleted state. A workaround was to define it as
> the same state with a higher version. The same state and
> higher version made the livepatch compatible. The higher
> version signalized that it was a new variant (a revert) so that
> newer livepatches did not do the revert again...
>
>
>> A livepatch without a particular
>> state was not bound to any restriction on that state, so nothing
>> prevented subsequent atomic replace patches from blowing away existing
>> states (those patches cleaned up their states on their disabling),
>
> I am confused here. Is the talking about the original or new
> semantic?
>

... so nevermind about this.

> In the original semantic, only callbacks from the new livepatch
> were called during the atomic replace. Livepatches were
> compatible only when the new livepatch supported all
> existing states.
>
> In the new semantic, the callbacks from the obsolete livepatch
> are called for obsolete states. The new livepatch does not need
> to know about the state. By other words, the replaced livepatch
> can clean up its own mess.
>

Yes, the new model means that the last livepatch providing the state
also handles its cleanup on its way out. This is much more intuitive
than the previous semantic (for which I missed the step above :)

>> subsequent non-atomic replace patches from adding to the collective
>> livepatch state.
>
> Honestly, I do not think much about the non-atomic livepatches.
> It would require input from people who use this approach.
> It looks like a wild word to me ;-)
>
> I allowed to use the same states for more non-atomic livepatches
> because it might make sense to share shadow variables. Also more
> livepatches might depend on the same change made by callbacks
> and it need not matter which one is installed first.
>

IMHO the non-atomic world only made sense with cumulative patches. I
see some folks reporting that separate operating groups are layering
non-atomic patches across subsystems and my head spins.

But we are going to need to consider the use cases (or perhaps prevent
them) for the eventual documentation.

>
>> This patchset does away with .version and adds .block_disable. This is
>> very different from versioning as prevents the associated state from
>> ever going away. In an atomic-replace series of livepatches, this means
>> once a state is introduced, all following patches must contain that
>> state. In non-atomic-replace series, there is no restriction on
>> subsequent patches, but the original patch introducing the state cannot
>> ever be disabled/unloaded. (I'm not going to consider future hybrid
>> mixed atomic/not use cases as my brain is already full.)
>
> Yes, this describes the old behavior very well. And the impossibility
> to remove existing states using the atomic replace was one of the problems.
>
> The API solves this elegantly because it calls callbacks from
> the replaced livepatch for the obsolete states. The livepatch
> needed to implement these callbacks anyway to support the disable
> livepatch operation.
>
> And there is still the option to do not implement the reverse
> operation when it is not easy or safe. The author could set
> the new .block_disable flag. It blocks disabling the state.
> Which blocks disabling the livepatch or replacing it with
> a livepatch which does not support, aka is not compatible with,
> the state.
>
>
>> Finally, the patchset adds .is_shadow and .callbacks. A short sequence
>> of livepatches may look like:
>>
>> klp_patch A | klp_patch B
>> .states[x] | .states[y]
>> .id = 42 | .id = 42
>> .callbacks | .callbacks
>> .block_disable | .block_disable
>> .is_shadow | .is_shadow
>>
>> is there any harm or confusion if the two patches' state 42 contained
>> disparate .callbacks, .block_disable, or .is_shadow contents?
>
> Yes, two incompatible states with the same .id would break things.
> The callbacks won't be called and the old shadow variables
> won't get freed during an atomic replace.
>
> It is responsibility of the author of the livepatches to use
> different .id for different states.
>
> I am not sure if we could prevent mistakes. Hmm, we might add
> a check that every .id is there only once in the patch.states[] array.
> Also we could add a human readable .name of the state and ensure
> that it is the same. Or something like this.
>

Well, providing the same state twice in the same klp_patch seems highly
likely a bug by livepatch author. That's worth a WARN?

I'm not sure what to think about the same state id provided by two
klp_patches. For a atomic-replace series of patches, if the state
content is the same, it's effectively like handing off cleanup
responsibility for that state to the incoming patch, right? If the
state content changes, that would mean that the incoming patch is
redefining the state... which could be ok?

> Yeah, as I said. The non-atomic-replace world is a kind of jungle.
> It would require some real life users which might define some
> sane rules.
>

Yup. I can kinda grok things with cumulative non-replace, but it gets a
lot harder when considering other use cases cases. I'll put non-replace
aside for now and continue diving into the patchset. Thanks for the
explanations!

--
Joe