Re: in_compat_syscall() on x86
From: Andy Lutomirski
Date: Mon Jan 04 2021 - 18:12:44 EST
> On Jan 4, 2021, at 2:36 PM, David Laight <David.Laight@xxxxxxxxxx> wrote:
>
> From: Eric W. Biederman
>> Sent: 04 January 2021 20:41
>>
>> Al Viro <viro@xxxxxxxxxxxxxxxxxx> writes:
>>
>>> On Mon, Jan 04, 2021 at 12:16:56PM +0000, David Laight wrote:
>>>> On x86 in_compat_syscall() is defined as:
>>>> in_ia32_syscall() || in_x32_syscall()
>>>>
>>>> Now in_ia32_syscall() is a simple check of the TS_COMPAT flag.
>>>> However in_x32_syscall() is a horrid beast that has to indirect
>>>> through to the original %eax value (ie the syscall number) and
>>>> check for a bit there.
>>>>
>>>> So on a kernel with x32 support (probably most distro kernels)
>>>> the in_compat_syscall() check is rather more expensive than
>>>> one might expect.
>>
>> I suggest you check the distro kernels. I suspect they don't compile in
>> support for x32. As far as I can tell x32 is an undead beast of a
>> subarchitecture that just enough people use that it can't be removed,
>> but few enough people use it likely has a few lurking scary bugs.
>
> It is defined in the Ubuntu kernel configs I've got lurking:
> Both 3.8.0-19_generic (Ubuntu 13.04) and 5.4.0-56_generic (probably 20.04).
> Which is probably why it is in my test builds (I've just cut out
> a lot of modules).
>
>>>> It would be muck better if both checks could be done together.
>>>> I think this would require the syscall entry code to set a
>>>> value in both the 64bit and x32 entry paths.
>>>> (Can a process make both 64bit and x32 system calls?)
>>>
>>> Yes, it bloody well can.
>>>
>>> And I see no benefit in pushing that logics into syscall entry,
>>> since anything that calls in_compat_syscall() more than once
>>> per syscall execution is doing the wrong thing. Moreover,
>>> in quite a few cases we don't call the sucker at all, and for
>>> all of those pushing that crap into syscall entry logics is
>>> pure loss.
>>
>> The x32 system calls have their own system call table and it would be
>> trivial to set a flag like TS_COMPAT when looking up a system call from
>> that table. I expect such a change would be purely in the noise.
>
> Certainly a write of 0/1/2 into a dirtied cache line of 'current'
> could easily cost absolutely nothing.
> Especially if current has already been read.
>
> I also wondered about resetting it to zero when an x32 system call
> exits (rather than entry to a 64bit one).
>
> For ia32 the flag is set (with |=) on every syscall entry.
> Even though I'm pretty sure it can only change during exec.
It can change for every syscall. I have tests that do this.
>
>>> What's the point, really?
>>
>> Before we came up with the current games with __copy_siginfo_to_user
>> and x32_copy_siginfo_to_user I was wondering if we should make such
>> a change. The delivery of compat signal frames and core dumps which
>> do not go through the system call entry path could almost benefit from
>> a flag that could be set/tested when on those paths.
>
> For signal delivery it should (probably) depend on the system call
> that setup the signal handler.
I think it has worked this way for some time now.
> Although I'm sure I remember one kernel where some of it was done
> in libc (with a single entrypoint for all hadlers).
>
>> The fact that only SIGCHLD (which can not trigger a coredump) is
>> different saves the coredump code from needing such a test.
>>
>> The fact that the signal frame code is simple enough it can directly
>> call x32_copy_siginfo_to_user or __copy_siginfo_to_user saves us there.
>>
>> So I don't think we have any cases where we actually need a flag that
>> is independent of the system call but we have come very close.
>
> If a program can do both 64bit and x32 system calls you probably
> need to generate a 64bit core dump if it has ever made a 64bit
> system call??
I think core dump should (and does) depend on the execution mode at the time of the crash.
It’s worth noting that GCC’s understanding of mixed bitness is horrible.