Re: [patch V4 part 1 05/36] x86/entry: Flip _TIF_SIGPENDING and _TIF_NOTIFY_RESUME handling

From: Thomas Gleixner
Date: Wed May 13 2020 - 20:13:08 EST


Steven, Mathieu

(combo reply)

Steven Rostedt <rostedt@xxxxxxxxxxx> writes:
> On Wed, 13 May 2020 16:56:41 -0400 (EDT)
>> > + /* deal with pending signal delivery */
>> > + if (cached_flags & _TIF_SIGPENDING)
>> > + do_signal(regs);
>
> Looking deeper into this, it appears that do_signal() can freeze or kill the
> task.
>
> That is, it wont go back to user space here, but simply schedule out (being
> traced) or even exit (killed).
>
> Before the resume hooks would never be called in such cases, and now they
> are.

It theoretically matters because pending task work might kill the
task. That's the concern Andy and Peter had. Assume the following:

usermode

-> exception
set not fatal signal

-> exception
queue task work to kill task
<- return

<- return

The same could happen when the non fatal signal is set from a remote CPU.

So in theory that would result in:

handle non fatal signal first

handle task work which kills task

which would be the wrong order.

But that's just illusion.

>> Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> wrote:

>> Also, color me confused: is "do_signal()" actually running any user-space,
>> or just setting up the user-space stack for eventual return to signal
>> handler ?

I'm surprised that you can't answer that question yourself. How did you
ever make rseq work and how did rseq_signal_deliver() end up in
setup_rt_frame()?

Hint: Tracing might answer that question

And to cut it short:

Exit to user space happnes only through ONE channel, i.e. leaving
prepare_exit_to usermode().

exit_to_usermode_loop <-prepare_exit_to_usermode
do_signal <-exit_to_usermode_loop
get_signal <-do_signal
setup_sigcontext <-do_signal
do_syscall_64 <-entry_SYSCALL_64_after_hwframe
syscall_trace_enter <-do_syscall_64

sys_rt_sigreturn()
restore_altstack <-__ia32_sys_rt_sigreturn
syscall_slow_exit_work <-do_syscall_64
exit_to_usermode_loop <-do_syscall_64

>> Also, it might be OK, but we're changing the order of two things which
>> have effects on each other: restartable sequences abort fixup for preemption
>> and do_signal(), which also have effects on rseq abort.
>>
>> Because those two will cause the abort to trigger, I suspect changing
>> the order might be OK, but we really need to think this through.

That's a purely academic problem. The order is completely
irrelevant. You have to handle any order anyway:

usermode

-> exception / syscall
sets signal

<- return

prepare_exit_to_usemode()
cached_flags = READ_ONCE(t->flags);
exit_to_user_mode_loop(regs, cached_flags) {
while (cached_flags) {
local_irq_enable();

handle(cached_flags & RESCHED);
handle(cached_flags & UPROBE);
handle(cached_flags & PATCHING);
handle(cached_flags & SIGNAL);
handle(cached_flags & NOTIFY_RESUME);
handle(cached_flags & RETURN_NOTIFY);

local_irq_disable();

cached_flags = READ_ONCE(t->flags);
}

cached_flag is a momentary snapshot when attempting to return to user
space.

But after reenabling interrupts any of the relevant flag bits can be set
by an exception/interrupt or from remote. If preemption is enabled the
task can be scheduled out, migrated at any point before disabling
interrupts again. Even after disabling interrupts and before re-reading
cached flags there might be a remote change of flags.

That said, even for the case Andy and Peter were looking at (MCE) the
ordering is completely irrelevant.

I should have thought about this before, so thanks to both of you for
making me look at it again for the very wrong reasons.

Consider the patch dropped.

Thanks,

tglx