Re: [RESEND RFC PATCH v2] arm64: Exposes support for 32-bit syscalls

From: Arnd Bergmann
Date: Fri Feb 12 2021 - 09:00:25 EST


On Fri, Feb 12, 2021 at 12:33 PM Steven Price <steven.price@xxxxxxx> wrote:
> On 11/02/2021 20:21, sonicadvance1@xxxxxxxxx wrote:

> > The problem:
> > We need to support 32-bit processes running under a userspace
> > compatibility layer. The compatibility layer is a AArch64 process.
> > This means exposing the 32bit compatibility syscalls to userspace.
>
> I'm not sure how you come to this conclusion. Running 32-bit processes
> under a compatibility layer is a fine goal, but it's not clear why the
> entire 32-bit compat syscall layer is needed for this.
>
> As a case in point QEMU's user mode emulation already achieves this in
> many cases without any changes to the kernel.

I think it's a quantitative difference, not a qualitative one:

qemu does a nice job at translating the interfaces for many combinations
of host and target architectures at a decent speed, and is improving
at both the compatibility and the performance over time.

What both Tango and FEX promise is to be much faster by focusing
on one target architecture each, and to have better compatibility than
what qemu can do.

> > Who does this matter to?
> > Any user that has a specific need to run legacy 32-bit software under a
> > compatibility layer.
> > Not all software is open source or easy to convert to 64bit, it's
> > something we need to live with.
> > Professional software and the gaming ecosystem is rife with this.
> >
> > What applications have tried to work around this problem?
> > FEX emulator (1) - Userspace x86 to AArch64 compatibility layer
> > Tango binary translator (2) - AArch32 to AArch64 compatibility layer
> > QEmu (3) - Not really but they do some userspace ioctl emulation
>
> Can you expand on "not really"? Clearly there are limitations, but in
> general I can happily "chroot" into a distro filesystem using an
> otherwise incompatible architecture using a qemu-xxx-static binary.

The ioctl emulation in qemu is limited in multiple ways:
- it needs to duplicate the kernel's compat emulation for
every single command it wants to handle, and will always
lag behind what gets merged into the kernel and what
drivers a particular distro ships.
- some ioctl commands cannot be emulated in user space
because the compat code relies on tracking device state
in the kernel.
- In some cases, emulation can be expensive, both for
runtime overhead and for code complexity

> > What problems did they hit?
> > FEX and Tango hit problems with emulating memory related syscalls.
> > - Emulating 32-bit mmap, mremap, shmat in userspace changes behaviour
> > All three hit issues with ioctl emulation
> > - ioctls are free to do what they want including allocating memory and
> > returning opaque structures with pointers.
>
> Now I think we're getting to what the actual problems are:
>
> * mmap and friends have no (easy) way of forcing a mapping into a 32
> bit region.
> * ioctls are a mess
>
> The first seems like a reasonable goal - I've seen examples of MAP_32BIT
> being (ab)used to do this, but it actually restricts to 31 bits and it's
> not even available on arm64. Here I think you'd be better off focusing
> on coming up with a new (generic) way of restricting the addresses that
> the kernel will pick.

I think that would be useful for other projects as well.

> ioctls are going to be a problem whatever you do, and I don't think
> there is much option other than having a list of known ioctls and
> translating them in user space - see below.

In particular for the arm32-on-arm64 ioctl case, we have a known-working
implementation in the kernel, I don't see why we wouldn't want to use it.

the x86-32-on-anything case for FEX is trickier because it does
require handling the ia32 alignment case, and the proposed patch
does not handle that correctly for all commands. I think this would
be fixable in the kernel, but it requires a little more work.

> > This is now exposing the compat syscalls to userspace, but for the sake
> > of userspace compatibility it is a necessary evil.
>
> You've yet to convince me that it's "necessary" - I agree on the "evil"
> part ;)

I think it's much easier to argue in favor of exposing the kernel's
ioctl() emulation and a get_unmapped_area() limit to a process
specific address than for doing the entire syscalls emulation.

The emulation for any of the other syscalls should be manageable
once ioctl can be called directly, though there are a couple that
could fall into the same category (setsockopt, sendmsg/recvmsg,
fcntl).

Arnd