Re: [PATCH 2/6] arch: x86: Wrap TIF_IA32 checks

From: Andy Lutomirski
Date: Wed Jul 29 2020 - 01:10:14 EST


On Tue, Jul 28, 2020 at 9:46 PM Gabriel Krisman Bertazi
<krisman@xxxxxxxxxxxxx> wrote:
>
> Andy Lutomirski <luto@xxxxxxxxxx> writes:
>
> > On Tue, Jul 28, 2020 at 1:22 PM Gabriel Krisman Bertazi
> > <krisman@xxxxxxxxxxxxx> wrote:
> >>
> >> In preparation to remove TIF_IA32, add wrapper that check the process
> >> has IA32 ABI without using the flag directly.
> >
> > Thank you for doing this, but let's please do it right. There is,
> > fundamentally, no such thing as a "process with IA32 ABI".
>
> Hi Andy,
>
> Thanks a lot for your review.
>
> As you can see, I'm learning my way here. Can you clarify "there is no
> such a thing as a 'process with IA32 ABI'"? I'm not sure if I confused
> the terminology or if (more worrisome for me) I got the concepts wrong.
>
> My understanding is that TIF_IA32 marks a thread that is running under
> the 32-bit compat mode, which would be running a 32-bit process (as in
> compiled with -m32, for instance), while TIF_X32 marks a process running
> under the X32 ABI. Each process would have only one of these
> "personalities". This is what I meant by a process with IA32 ABI (which
> is wrong in any case). Is there more to it, or is the problem the
> terminology I used?

There's more to it.

On sane architectures like Linux on ARM64, a process can be 32-bit or
64-bit. If you are a 64-bit ARM process, you run in 64-bit mode,
period. Anything other than execve() that lets you become 32-bit is a
bug.

Linux on x86 is nutty, though. (IIRC Sparc is too.) In particular,
it's entirely legal for a Linux program to switch between 32-bit and
64-bit mode all by itself. This can be done with LJMP, LRET, IRET, or
a few other hacks. It can also be done using any of the sigreturn
syscalls. (SYSENTER comes to mind, but anyone who uses SYSENTER to
switch modes is *nuts*.) Certain evil people named Andy Lutomirski
write kernel selftests that switch modes using sigreturn just to abuse
the kernel. But some real programs do this too. For example, I think
that CRIU restores 32-bit programs by starting out as a 64-bit
program, restoring everything, and then switching to 32-bit mode.
Keep in mind that most of these mode-switching tricks do not enter the
kernel at all.

The upshot is that kernel does not actually know what the running
program's ABI is. For all the kernel knows, a process uses the
Linux/psABI 64-bit ABI, the ia32 ABI, the x32 ABI, the Windows 32-bit
ABI, the Windows 64-bit ABI, and some gnarly ABI from DOS or OS/2, all
in the same process.

So we really shouldn't be trying to keep track of what we think the
user program should be doing, and instead we should pay attention to
what the user program actually did in its interactions with the
kernel. If we want to know whether a user task is running in 64-bit
mode right now, we can use user_64bit_mode(regs). (And keep in mind
that syscalls like rt_sigreturn() and anything that gets ptraced can
*change* the outcome of user_64bit_mode(regs) mid-syscall.) If we're
delivering a signal, we instead want to know the ABI of the signal's
handler, so we use is_ia32_frame(ksig) and is_x32_frame(ksig). If we
want to know whether mmap() should return values above 2^32-1, we
currently use some hackery, but we really ought to be looking at
whether the call was 64-bit mmap() or compat mmap(). (I had some
really half-baked patches to fix this at one point, but the core code
was a real mess and I didn't finish it.) If we still supported MPX in
the kernel, we'd have a real mess, because MPX malfunctions if user
programs switch bitness on us. Good riddance!

>
> I don't have any comments on the other things you mentioned, except that
> I need to go through them and better understand your suggestions. Would
> you prefer me to rework this patch series with what you suggested or is
> this something you want to take over and do yourself? Both ways are
> fine by me.

Please rework it :) I have seriously limited bandwidth right now.