Re: [External] Re: [PATCH] livepatch: Only block the removal of KLP_UNPATCHED forced transition patch
From: Chengming Zhou
Date: Fri Mar 04 2022 - 10:14:34 EST
On 2022/3/3 11:43 下午, Joe Lawrence wrote:
> On 3/3/22 5:33 AM, Chengming Zhou wrote:
>> On 2022/3/3 3:51 下午, Miroslav Benes wrote:
>>> On Thu, 3 Mar 2022, Chengming Zhou wrote:
>>>
>>>> Hi,
>>>>
>>>> On 2022/3/2 5:55 下午, Miroslav Benes wrote:
>>>>> Hi,
>>>>>
>>>>> On Tue, 1 Mar 2022, Chengming Zhou wrote:
>>>>>
>>>>>> module_put() is currently never called for a patch with forced flag, to block
>>>>>> the removal of that patch module that might still be in use after a forced
>>>>>> transition.
>>>>>>
>>>>>> But klp_force_transition() will flag all patches on the list to be forced, since
>>>>>> commit d67a53720966 ("livepatch: Remove ordering (stacking) of the livepatches")
>>>>>> has removed stack ordering of the livepatches, it will cause all other patches can't
>>>>>> be unloaded after disabled even if they have completed the KLP_UNPATCHED transition.
>>>>>>
>>>>>> In fact, we don't need to flag a patch to forced if it's a KLP_PATCHED forced
>>>>>> transition. It can still be unloaded only if it has passed through the consistency
>>>>>> model in KLP_UNPATCHED transition.
>>>>>>
>>>>>> So this patch only set forced flag and block the removal of a KLP_UNPATCHED forced
>>>>>> transition livepatch.
>>>>>>
>>>>>> Signed-off-by: Chengming Zhou <zhouchengming@xxxxxxxxxxxxx>
>>>>>> ---
>>>>>> kernel/livepatch/transition.c | 4 ++--
>>>>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
>>>>>> index 5683ac0d2566..8b296ad9e407 100644
>>>>>> --- a/kernel/livepatch/transition.c
>>>>>> +++ b/kernel/livepatch/transition.c
>>>>>> @@ -641,6 +641,6 @@ void klp_force_transition(void)
>>>>>> for_each_possible_cpu(cpu)
>>>>>> klp_update_patch_state(idle_task(cpu));
>>>>>>
>>>>>> - klp_for_each_patch(patch)
>>>>>> - patch->forced = true;
>>>>>> + if (klp_target_state == KLP_UNPATCHED)
>>>>>> + klp_transition_patch->forced = true;
>>>>>
>>>>> I do not think this would interact nicely with the atomic replace feature.
>>>>> If you force the transition of a patch with ->replace set to true, no
>>>>> existing patch would get ->forced set with this change, which means all
>>>>> patches will be removed at the end of klp_try_complete_transition(). And
>>>>> that is something we want to prevent.
>>>>
>>>> Good point, I should check if it's an atomic replace livepatch in the else
>>>> branch, in which case we have to set all existing patches to forced.
>>>
>>> Yes, but that leads to a question if it then brings any value. Forcing a
>>> transition should be exceptional. If it is needed, there may be other
>>> issues involved which should probably be fixed. Have you come across a
>>> practical situation where the patch helped?
>>
>> Yes, you're right, the correct way is to find and fix the issues that
>> make us to use this "force" transition interface, until we don't need
>> to use it.
>>
>> Apart from this reason, another reason we may use "force" transition
>> is that we want to speed up the transition process of some patches
>> when load them, and we can make sure these patches are safe to do so.
>> (just like a consistency model check disable option when load a patch)
>>
>
> Interesting use case. Can you share any example livepatches where the
> transition time was exceptionally long and that lead to requiring this
> patch?
Sorry, I haven't easy reproducible testcase on hand, maybe I could try to
make one to simulate the production environment later.
>
> From a kpatch developer's perspective, it would be interesting to read
> how you go about ensuring forced livepatch safety. We don't generally
> build forced livepatches, so I'm curious how the dev/review process goes.
We also use kpatch-build for some patches too, but for some other patches,
which need to add new members to some struct type, or fix some kernel function
bugs, we may need to rewrite the source patch to make a livepatch module.
There are some types that don't need per-task consistency or even can replace
the old functions when tasks stack in the old functions. We may want to use
"force" transition in case load process timeout.
Thanks.
>
> Thanks,