Re: [RFC PATCH 2/2] livepatch: Implement livepatch hybrid mode

From: Petr Mladek
Date: Mon Jan 27 2025 - 09:31:53 EST


On Mon 2025-01-27 14:35:26, Yafang Shao wrote:
> The atomic replace livepatch mechanism was introduced to handle scenarios
> where we want to unload a specific livepatch without unloading others.
> However, its current implementation has significant shortcomings, making
> it less than ideal in practice. Below are the key downsides:

[...]

> In the hybrid mode:
>
> - Specific livepatches can be marked as "non-replaceable" to ensure they
> remain active and unaffected during replacements.
>
> - Other livepatches can be marked as "replaceable," allowing targeted
> replacements of only those patches.
>
> This selective approach would reduce unnecessary transitions, lower the
> risk of temporary patch loss, and mitigate performance issues during
> livepatch replacement.
>
> --- a/kernel/livepatch/core.c
> +++ b/kernel/livepatch/core.c
> @@ -658,6 +658,8 @@ static int klp_add_nops(struct klp_patch *patch)
> klp_for_each_object(old_patch, old_obj) {
> int err;
>
> + if (!old_patch->replaceable)
> + continue;

This is one example where things might get very complicated.

The same function might be livepatched by more livepatches, see
ops->func_stack. For example, let's have funcA and three livepatches:
a
+ lp1:
.replace = false,
.non-replace = true,
.func = {
.old_name = "funcA",
.new_func = lp1_funcA,
}, { }

+ lp2:
.replace = false,
.non-replace = false,
.func = {
.old_name = "funcA",
.new_func = lp2_funcA,
},{
.old_name = "funcB",
.new_func = lp2_funcB,
}, { }


+ lp3:
.replace = true,
.non-replace = false,
.func = {
.old_name = "funcB",
.new_func = lp3_funcB,
}, { }


Now, apply lp1:

+ funcA() gets redirected to lp1_funcA()

Then, apply lp2

+ funcA() gets redirected to lp2_funcA()

Finally, apply lp3:

+ The proposed code would add "nop()" for
funcA() because it exists in lp2 and does not exist in lp3.

+ funcA() will get redirected to the original code
because of the nop() during transition

+ nop() will get removed in klp_complete_transition() and
funcA() will get suddenly redirected to lp1_funcA()
because it will still be on ops->func_stack even
after the "nop" and lp2_funcA() gets removed.

=> The entire system will start using another funcA
implementation at some random time

=> this would violate the consistency model


The proper solution might be tricky:

1. We would need to detect this situation and do _not_ add
the "nop" for lp3 and funcA().

2. We would need a more complicate code for handling the task states.

klp_update_patch_state() sets task->patch_state using
the global "klp_target_state". But in the above example,
when enabling lp3:

+ funcA would need to get transitioned _backward_:
KLP_TRANSITION_PATCHED -> KLP_TRANSITION_UNPATCHED
, so that it goes on ops->func_stack:
lp2_funcA -> lp1->funcA

while:

+ funcA would need to get transitioned forward:
KLP_TRANSITION_UNPATCHED -> KLP_TRANSITION_PATCHED
, so that it goes on ops->func_stack:
lp2_funcB -> lp3->funcB


=> the hybrid mode would complicate the life for both livepatch
creators/maintainers and kernel code developers/maintainers.

I am afraid that this complexity is not acceptable if there are
better solutions for the original problem.

> err = klp_add_object_nops(patch, old_obj);
> if (err)
> return err;

I am sorry but I am quite strongly against this approach!

Best Regards,
Petr