Re: Question: livepatch failed for new fork() task stack unreliable

From: Wangshaobo (bobo)
Date: Mon Jun 01 2020 - 21:22:46 EST



å 2020/6/2 2:05, Josh Poimboeuf åé:
On Sat, May 30, 2020 at 10:21:19AM +0800, Wangshaobo (bobo) wrote:
1) when a user mode task just fork start excuting ret_from_fork() till
schedule_tail, unwind_next_frame found

orc->sp_reg is ORC_REG_UNDEFINED but orc->end not equals zero, this time
arch_stack_walk_reliable()

terminates it's backtracing loop for unwind_done() return true. then 'if
(!(task->flags & (PF_KTHREAD | PF_IDLE)))'

in arch_stack_walk_reliable() true and return -EINVAL after.

* The stack trace looks like that:

ret_from_fork

ÂÂÂÂÂ -=> UNWIND_HINT_EMPTY

ÂÂÂÂÂ -=> schedule_tailÂÂÂÂÂÂÂÂÂÂÂÂ /* schedule out */

ÂÂÂÂÂ ...

ÂÂÂÂÂ -=> UNWIND_HINT_REGSÂÂÂÂÂ /*Â UNDO */
Yes, makes sense.

2) when using call_usermodehelper_exec_async() to create a user mode task,
ret_from_fork() still not exec whereas

the task has been scheduled in __schedule(), at this time, orc->sp_reg is
ORC_REG_UNDEFINED but orc->end equals zero,

unwind_error() return true and also terminates arch_stack_walk_reliable()'s
backtracing loop, end up return from

'if (unwind_error())' branch.

* The stack trace looks like that:

-=> call_usermodehelper_exec

       Â -=> do_exec

ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ -=> search_binary_handler

ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ -=> load_elf_binary

ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ -=> elf_map

ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ -=> vm_mmap_pgoff

-=> down_write_killable

-=> _cond_resched

ÂÂÂÂÂÂÂÂÂÂÂÂ -=> __scheduleÂÂÂÂÂÂÂÂÂÂ /* scheduled to work */

-=> ret_from_forkÂÂÂÂÂÂ /* UNDO */
I don't quite follow the stacktrace, but it sounds like the issue is the
same as the first one you originally reported:

yes, true, same as the first one, the only difference what i want to say is the task has been scheduled but the first one is not.

1) The task was not actually scheduled to excute, at this time
UNWIND_HINT_EMPTY in ret_from_fork() has not reset unwind_hint, it's
sp_reg and end field remain default value and end up throwing an error
in unwind_next_frame() when called by arch_stack_walk_reliable();
Or am I misunderstanding?

And to reiterate, these are not "livepatch failures", right? Livepatch
doesn't fail when stack_trace_save_tsk_reliable() returns an error. It
recovers gracefully and tries again later.

yes, you are right, "livepatch failures" only indicates serveral retry failures, we found if frequent fork() happend in current

system, it is easier to cause retry but still always end up success.

so i think this question is related to ORC unwinder, could i ask if you have strategy or plan to avoid this problem ?

thanks,

Wang ShaoBo