Re: [PATCH 2/3] livepatch,sched: Add livepatch task switching to cond_resched()

From: Josh Poimboeuf
Date: Wed Feb 15 2023 - 21:26:39 EST


On Wed, Feb 15, 2023 at 02:30:36PM +0100, Petr Mladek wrote:
> > static inline int _cond_resched(void)
> > {
> > + klp_sched_try_switch();
> > return __cond_resched();
>
> My only concern is if it might cause any performance problems.
>
> On one hand, cond_resched() is used in code paths that are slow
> on its own. Also it will do nothing most of the time.
>
> On the other hand, cond_resched() is typically used in cycles.
> One cycle might be fast. The code might be slow because there
> are too many cycles. Repeating the same failing test might
> prolong the time significantly.

Yes, but it should hopefully be very rare to patch a function in the
call stack of a kthread loop. In general it's a good idea for the patch
author to avoid that.

> An idea is to try the switch only when it was not done during
> a real schedule. Something like:
>
> static inline int _cond_resched(void)
> {
> int scheduled;
>
> scheduled = __cond_resched();
> if (scheduled)
> klp_sched_try_switch();
>
> return scheduled();
> }
>
> But it would make it less reliable/predictable. Also it won't work
> in configurations when cond_resched() is always a nop.
>
> I am probably too careful. We might keep it simple until any real
> life problems are reported.

If we can get away with it, I much prefer the simple unconditional
klp_sched_try_switch() because of the predictability and quickness with
which the kthread gets patched.

> > --- a/kernel/livepatch/transition.c
> > +++ b/kernel/livepatch/transition.c
> > @@ -76,6 +96,8 @@ static void klp_complete_transition(void)
> > klp_transition_patch->mod->name,
> > klp_target_state == KLP_PATCHED ? "patching" : "unpatching");
> >
> > + klp_cond_resched_disable();
> > +
>
> Nit: Strictly speaking, this is not needed when klp_complete_transition()
> is called from klp_cancel_transition(). In this case,
> klp_cond_resched_enable() was not called. So it might be moved into
> klp_try_complete_transition().

Argh, I always forget about that pesky klp_cancel_transition().

> More important thing, thinking loudly:
>
> We need to make sure that no task is in the middle
> klp_cond_resched_disable() when we modify anything that is used there.
>
> We seem to be on the safe side in klp_complete_transition(). We are
> here only when all tasks have TIF_PATCH_PENDING cleared. In this case,
> __klp_sched_try_switch() just returns. Also it calls
> klp_synchronize_transition() so that all tasks finish the critical part
> in __klp_sched_try_switch() before any new transition starts.
>
> But it is not the case in klp_reverse_transition(). It modifies
> klp_target_state() when __klp_sched_try_switch might be in the middle
> of klp_check_stack() and it might give wrong result.
>
> klp_reverse_transition() already solves similar race with
> klp_update_patch_state() by clearing all TIF_PATCH_PENDING flags
> and calling klp_synchronize_transition(). We just need to do
> it earlier. Something like:

Yes! Thanks, I can always count on you to find the race conditions ;-)

This highlights the similarities between klp_target_state(current) and
__klp_sched_try_switch(), they both access TIF_PATCH_PENDING
out-of-band.

Also, I'll update the comment in klp_copy_process(). It should be safe
for with __klp_sched_try_switch() for the same reason as
klp_update_patch_state(current): they all only work on 'current'.

--
Josh