Re: [patch V4 part 1 05/36] x86/entry: Flip _TIF_SIGPENDING and _TIF_NOTIFY_RESUME handling
From: Thomas Gleixner
Date: Wed May 13 2020 - 20:13:08 EST
Steven, Mathieu
(combo reply)
Steven Rostedt <rostedt@xxxxxxxxxxx> writes:
> On Wed, 13 May 2020 16:56:41 -0400 (EDT)
>> > + /* deal with pending signal delivery */
>> > + if (cached_flags & _TIF_SIGPENDING)
>> > + do_signal(regs);
>
> Looking deeper into this, it appears that do_signal() can freeze or kill the
> task.
>
> That is, it wont go back to user space here, but simply schedule out (being
> traced) or even exit (killed).
>
> Before the resume hooks would never be called in such cases, and now they
> are.
It theoretically matters because pending task work might kill the
task. That's the concern Andy and Peter had. Assume the following:
usermode
-> exception
set not fatal signal
-> exception
queue task work to kill task
<- return
<- return
The same could happen when the non fatal signal is set from a remote CPU.
So in theory that would result in:
handle non fatal signal first
handle task work which kills task
which would be the wrong order.
But that's just illusion.
>> Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> wrote:
>> Also, color me confused: is "do_signal()" actually running any user-space,
>> or just setting up the user-space stack for eventual return to signal
>> handler ?
I'm surprised that you can't answer that question yourself. How did you
ever make rseq work and how did rseq_signal_deliver() end up in
setup_rt_frame()?
Hint: Tracing might answer that question
And to cut it short:
Exit to user space happnes only through ONE channel, i.e. leaving
prepare_exit_to usermode().
exit_to_usermode_loop <-prepare_exit_to_usermode
do_signal <-exit_to_usermode_loop
get_signal <-do_signal
setup_sigcontext <-do_signal
do_syscall_64 <-entry_SYSCALL_64_after_hwframe
syscall_trace_enter <-do_syscall_64
sys_rt_sigreturn()
restore_altstack <-__ia32_sys_rt_sigreturn
syscall_slow_exit_work <-do_syscall_64
exit_to_usermode_loop <-do_syscall_64
>> Also, it might be OK, but we're changing the order of two things which
>> have effects on each other: restartable sequences abort fixup for preemption
>> and do_signal(), which also have effects on rseq abort.
>>
>> Because those two will cause the abort to trigger, I suspect changing
>> the order might be OK, but we really need to think this through.
That's a purely academic problem. The order is completely
irrelevant. You have to handle any order anyway:
usermode
-> exception / syscall
sets signal
<- return
prepare_exit_to_usemode()
cached_flags = READ_ONCE(t->flags);
exit_to_user_mode_loop(regs, cached_flags) {
while (cached_flags) {
local_irq_enable();
handle(cached_flags & RESCHED);
handle(cached_flags & UPROBE);
handle(cached_flags & PATCHING);
handle(cached_flags & SIGNAL);
handle(cached_flags & NOTIFY_RESUME);
handle(cached_flags & RETURN_NOTIFY);
local_irq_disable();
cached_flags = READ_ONCE(t->flags);
}
cached_flag is a momentary snapshot when attempting to return to user
space.
But after reenabling interrupts any of the relevant flag bits can be set
by an exception/interrupt or from remote. If preemption is enabled the
task can be scheduled out, migrated at any point before disabling
interrupts again. Even after disabling interrupts and before re-reading
cached flags there might be a remote change of flags.
That said, even for the case Andy and Peter were looking at (MCE) the
ordering is completely irrelevant.
I should have thought about this before, so thanks to both of you for
making me look at it again for the very wrong reasons.
Consider the patch dropped.
Thanks,
tglx