Re: [PATCH] Documentation/livepatch: remove the limitation for schedule() patching

From: Miroslav Benes
Date: Mon Jan 09 2017 - 07:52:49 EST


On Fri, 6 Jan 2017, Josh Poimboeuf wrote:

> On Fri, Jan 06, 2017 at 03:00:45PM +0100, Miroslav Benes wrote:
> >
> > 2. reversion of the process does not work as expected. The kernel
> > crashes after the removal of the module. A task very likely slept in
> > schedule and was not migrated properly. It might be because of the races
> > in klp_reverse_transition() described by Petr, or might be somewhere
> > else. I'll look into it.
>
> Hm, will be interesting to see the cause of this...

The absence of the patched schedule() on the stack was the cause.
klp_try_switch_task() thus did not see it and happily migrated the task.

The reason is funny. One cannot patch __schedule() (which is of
interested) because of the notrace attribute. So all the callers need to
be processed. I tried to make my life easier and patched only schedule().
GCC then inlined new __schedule() to the new schedule(). When I added
noinline attribute to the new __schedule() everything was fine (because
suddenly new schedule() was on the stack as expected).

There is still one thing which I don't understand. Why __schedule()
(patched or the original) is not on the stack. The actual "sleep"
should happen in __switch_to_asm() which is C function now. And there is a
call to __switch_to_asm() in __schedule(). __schedule() thus should be on
the stack, shouldn't it? What am I missing? __switch_to_asm() pushes %rbp
on the stack...

Miroslav