Re: Question: livepatch failed for new fork() task stack unreliable

From: Josh Poimboeuf
Date: Wed Jun 03 2020 - 11:34:19 EST


On Wed, Jun 03, 2020 at 10:06:07PM +0800, Wangshaobo (bobo) wrote:
> Today I test your fix, but arch_stack_walk_reliable() still return failed
> sometimes, I
>
> found one of three scenarios mentioned failed:
>
>
> 1. user task (just fork) but not been scheduled
>
> ÂÂÂ test FAILED
>
> ÂÂÂ it is because unwind_next_frame() get the first frame, this time
> state->signal is false, and then return
>
> ÂÂÂ failed in the same place for ret_from_fork has not executed at all.

Oops - I meant to do it in __unwind_start (as you discovered).

> 2. user task (just fork) start excuting ret_from_fork() till schedule_tail
> but not UNWIND_HINT_REGS
>
>  test condition :loop fork() in current system
>
> ÂÂÂ result : SUCCESS,
>
> ÂÂÂ it looks like this modification works for my perspective :
>
> - /* Success path for non-user tasks, i.e. kthreads and idle tasks */
> - if (!(task->flags & (PF_KTHREAD | PF_IDLE)))
> - return -EINVAL;
> but is this possible to miss one invalid judgement condition ? (1)

I'm not sure I understand your question, but I think this change
shouldn't break anything else because the ORC unwinder is careful to
always set an error if it doesn't reach the end of the stack, especially
with my recent ORC fixes which went into 5.7.

> 3. call_usermodehelper_exec_async
>
> ÂÂÂ test condition :loop call call_usermodehelper() in a module selfmade.
>
> ÂÂÂ result : SUCCESS,
>
> ÂÂ it looks state->signal==true works when unwind_next_frame() gets the end
> ret_from_fork() frame,
>
> ÂÂ but i don't know how does it work, i am confused by this sentences, how
> does the comment means sibling calls and
>
> ÂÂÂ calls to noreturn functions? (2)
>
> ÂÂÂ ÂÂÂ ÂÂÂ /*
> ÂÂÂÂ ÂÂÂ ÂÂÂ * Find the orc_entry associated with the text address.
> ÂÂÂÂ ÂÂÂ ÂÂÂ *
> Â ÂÂ ÂÂÂ ÂÂÂ * Decrement call return addresses by one so they work for sibling
> ÂÂÂÂ ÂÂÂ ÂÂÂ * calls and calls to noreturn functions.
> ÂÂÂÂ ÂÂÂ ÂÂÂ */
> ÂÂ ÂÂ ÂÂÂ Â orc = orc_find(state->signal ? state->ip : state->ip - 1);
> ÂÂ ÂÂ ÂÂÂ Â if (!orc) {

To be honest, I don't remember what I meant by sibling calls. They
don't even leave anything on the stack.

For noreturns, the code might be laid out like this:

func1:
...
call noreturn_foo
func2:

func2 is immediately after the call to noreturn_foo. So the return
address on the stack will actually be 'func2'. We want to retrieve the
ORC data for the call instruction (inside func1), instead of the
instruction at the beginning of func2.

I should probably update that comment.

--
Josh