Re: [patch 08/10] x86/entry/32: Remove the 0/-1 distinction from exception entries

From: Andy Lutomirski
Date: Wed Feb 26 2020 - 13:57:35 EST


On Wed, Feb 26, 2020 at 10:42 AM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>
> Andy Lutomirski <luto@xxxxxxxxxx> writes:
>
> > On 2/25/20 1:36 PM, Thomas Gleixner wrote:
> >> Nothing cares about the -1 "mark as interrupt" in the errorcode anymore. Just
> >> use 0 for all excpetions which do not have an errorcode consistently.
> >>
> >
> > I sincerely wish this were the case. But look at collect_syscall() in
> > lib/syscall.c.
> >
> > It would be really quite nice to address this for real in some
> > low-overhead way. My suggestion would be to borrow a trick from 32-bit:
> > split regs->cs into ->cs and ->__csh, and stick CS_FROM_SYSCALL into
> > __csh for syscalls. This will only add any overhead at all to the int80
> > case. Then we could adjust syscall_get_nr() to look for CS_FROM_SYSCALL.
> >
> > What do you think? An alternative would be to use the stack walking
> > machinery in collect_syscall(), since the mere existence of that
> > function is abomination and we may not care about performance.
>
> Looking deeper. The code in common_exception does:
>
> movl PT_ORIG_EAX(%esp), %edx # get the error code
> movl $-1, PT_ORIG_EAX(%esp) # no syscall to restart
>
> So whatever the exception pushed on the stack the thing what
> collect_syscall finds is -1.
>
> The pushed value is used as the error_code argument for the exception
> handler and I really can't find a single one which cares (anymore).
>
> But darn and I overlooked that, it's propagated to do_trap() and
> friends, but even if this causes a user visible change, I doubt that
> anything cares about it today simply because for giggles a 64bit kernel
> unconditionally pushes 0 for all exceptions which do not have a hardware
> error code on stack. So any 32bit application which excpects a
> particular error code (0/-1) in the signal would have been broken on the
> first day it ran on a x64 bit kernel.
>
> If someone yells regression, then that's really trivial to fix in
> C-code.

I *think* this is plumbed much more directly to userspace:

$ cat /proc/$$/syscall
61 0xffffffff 0x7ffccf734ed0 0xa 0x0 0x1 0x0 0x7ffccf734eb8 0x7f0667465eda

That entire feature is highly dubious and I suppose we could just
delete it. But right now, we at least pretend that we can tell,
totally asynchronously, whether another task is in a syscall. Unless
we do *something*, though, I think you shouldn't make this change.

Sticking 0 in the error_code field in ucontext for a signal with no
error code seems entirely harmless to me in contrast.