Re: Question: livepatch failed for new fork() task stack unreliable

From: Josh Poimboeuf
Date: Mon Jun 01 2020 - 14:06:29 EST


On Sat, May 30, 2020 at 10:21:19AM +0800, Wangshaobo (bobo) wrote:
> 1) when a user mode task just fork start excuting ret_from_fork() till
> schedule_tail, unwind_next_frame found
>
> orc->sp_reg is ORC_REG_UNDEFINED but orc->end not equals zero, this time
> arch_stack_walk_reliable()
>
> terminates it's backtracing loop for unwind_done() return true. then 'if
> (!(task->flags & (PF_KTHREAD | PF_IDLE)))'
>
> in arch_stack_walk_reliable() true and return -EINVAL after.
>
> * The stack trace looks like that:
>
> ret_from_fork
>
> ÂÂÂÂÂ -=> UNWIND_HINT_EMPTY
>
> ÂÂÂÂÂ -=> schedule_tailÂÂÂÂÂÂÂÂÂÂÂÂ /* schedule out */
>
> ÂÂÂÂÂ ...
>
> ÂÂÂÂÂ -=> UNWIND_HINT_REGSÂÂÂÂÂ /*Â UNDO */

Yes, makes sense.

> 2) when using call_usermodehelper_exec_async() to create a user mode task,
> ret_from_fork() still not exec whereas
>
> the task has been scheduled in __schedule(), at this time, orc->sp_reg is
> ORC_REG_UNDEFINED but orc->end equals zero,
>
> unwind_error() return true and also terminates arch_stack_walk_reliable()'s
> backtracing loop, end up return from
>
> 'if (unwind_error())' branch.
>
> * The stack trace looks like that:
>
> -=> call_usermodehelper_exec
>
> Â Â Â Â Â Â Â ÂÂ -=> do_exec
>
> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ -=> search_binary_handler
>
> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ -=> load_elf_binary
>
> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ -=> elf_map
>
> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ -=> vm_mmap_pgoff
>
> -=> down_write_killable
>
> -=> _cond_resched
>
> ÂÂÂÂÂÂÂÂÂÂÂÂ -=> __scheduleÂÂÂÂÂÂÂÂÂÂ /* scheduled to work */
>
> -=> ret_from_forkÂÂÂÂÂÂ /* UNDO */

I don't quite follow the stacktrace, but it sounds like the issue is the
same as the first one you originally reported:

> 1) The task was not actually scheduled to excute, at this time
> UNWIND_HINT_EMPTY in ret_from_fork() has not reset unwind_hint, it's
> sp_reg and end field remain default value and end up throwing an error
> in unwind_next_frame() when called by arch_stack_walk_reliable();

Or am I misunderstanding?

And to reiterate, these are not "livepatch failures", right? Livepatch
doesn't fail when stack_trace_save_tsk_reliable() returns an error. It
recovers gracefully and tries again later.

--
Josh