Re: [REGRESSION] x86/entry: TIF_SINGLESTEP handling is still broken

From: Linus Torvalds
Date: Sun Jan 31 2021 - 16:29:26 EST


On Sun, Jan 31, 2021 at 12:20 PM Gabriel Krisman Bertazi
<krisman@xxxxxxxxxxxxx> wrote:
>
>
> I think we should migrate TIF_SINGLESTEP to a SYSCALL_WORK flag as that
> is just a simple refactor.

That makes no sense at all to me.

A single-step has absolutely nothing to do with system calls, and it's
also not what any other architecture is doing.

So honestly, something is wrong if TIF_SINGLESTEP should be moved to
the SYSCALL_WORK_SYSCALL_xyz flags. That's not what single-stepping is
at all about.

It looks like commit 299155244770 ("entry: Drop usage of TIF flags in
the generic syscall code") is simply wrong. The whole premise for it
was wrong because it didn't notice that TIF_SINGLESTEP isn't a syscall
flag - and shouldn't be one.

The problem seems to be that TIF_SINGLESTEP has two different
meanings: the usual "enable single stepping", *and* then a special
magic hack to also do it at system call exit. You only seem to have
looked at the magic hack, not the actual REAL meaning of
TIF_SINGLESTEP, which causes TF to be set and the debug fault.

This whole code is very confusing. What is that
"arch_syscall_exit_tracehook()" thing in the first place? I see an
arch wrapper, but I don't see any architecture that actually defines
it, so it seems to always become just tracehook_report_syscall_exit().

It then does (on x86) the generic version, which does

if (step)
user_single_step_report(regs);
else
ptrace_report_syscall(regs, PTRACE_EVENTMSG_SYSCALL_EXIT);

where that user_single_step_report() is a nonsensical special case
that just does what a debug fault does for all the *usual* single step
events in exc_debug_user() (which does not use that pointless
"user_single_step_report()" function, but just does

send_sigtrap(regs, 0, get_si_code(dr6));

so please, let's take a step back here, and look at the basics:

- the *main* user of TIF_SINGLESTEP is regular debugging that causes
a #DB after each user space instruction, and causes that
send_sigtrap() in exc_debug_user()

- there is one very odd and weird special case that is about having
system call tracing enabled, which seems to have unclear semantics and
this is the case that got broken.

What's the fix here? Because no, single-stepping isn't about system
calls, and we should not try to change the code to be about system
calls, because no other architecture does that EITHER.

Now, it's possible that eventually

(a) the normal TF flag setting doesn't actually need TIF_SINGESTEP at all

(b) the TIF_SINGLESTEP bit really only needs to be about the odd
special case for system calls.

but as things are now, (a) is *NOT* true. We have very magical
behavior wrt TIF_SINGLESTEP, see enable_single_step(), and the comment
about

/*
* If TF was already set, check whether it was us who set it.
* If not, we should never attempt a block step.
*/

which actually ends up depending very subtly on whether TIF_SINGLESTEP
was already set *before* we did that enable_single_step(), and/or
whether TF was already set on the stack (because users can set it
themselves without ptrace).

Again, the special system call case is the special case, NOT the
normal case. Don't try to make TIF_SINGLESTEP be about the special
case.

Linus

Linus