Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag

From: Al Viro
Date: Sun Sep 20 2020 - 14:08:06 EST

On Sun, Sep 20, 2020 at 04:15:10PM +0100, Matthew Wilcox wrote:
> On Fri, Sep 18, 2020 at 02:45:25PM +0200, Christoph Hellwig wrote:
> > Add a flag to force processing a syscall as a compat syscall. This is
> > required so that in_compat_syscall() works for I/O submitted by io_uring
> > helper threads on behalf of compat syscalls.
> Al doesn't like this much, but my suggestion is to introduce two new
> opcodes -- IORING_OP_READV32 and IORING_OP_WRITEV32. The compat code
> can translate IORING_OP_READV to IORING_OP_READV32 and then the core
> code can know what that user pointer is pointing to.

Let's separate two issues:
1) compat syscalls want 32bit iovecs. Nothing to do with the
drivers, dealt with just fine.
2) a few drivers are really fucked in head. They use different
*DATA* layouts for reads/writes, depending upon the calling process.
IOW, if you fork/exec a 32bit binary and your stdin is one of those,
reads from stdin in parent and child will yield different data layouts.
On the same struct file.
That's what Christoph worries about (/dev/sg he'd mentioned is
one of those).

IMO we should simply have that dozen or so of pathological files
marked with FMODE_SHITTY_ABI; it's not about how they'd been opened -
it describes the userland ABI provided by those. And it's cast in stone.

Any in_compat_syscall() in ->read()/->write() instances is an ABI
bug, plain and simple. Some are unfixable for compatibility reasons, but
any new caller like that should be a big red flag.

How we import iovec array is none of the drivers' concern; we do
not need to mess with in_compat_syscall() reporting the matching value,
etc. for that. It's about the instances that want in_compat_syscall() to
decide between the 32bit and 64bit data layouts. And I believe that
we should simply have them marked as such and rejected by io_uring. With
any new occurences getting slapped down hard.

Current list of those turds:
/dev/sg (pointer-chasing, generally insane)
/sys/firmware/efi/vars/*/raw_var (fucked binary structure)
/sys/firmware/efi/vars/new_var (fucked binary structure)
/sys/firmware/efi/vars/del_var (fucked binary structure)
/dev/uhid (pointer-chasing for one obsolete command)
/dev/input/event* (timestamps)
/dev/uinput (timestamps)
/proc/bus/input/devices (fucked bitmap-to-text representation)
/sys/class/input/*/capabilities/* (fucked bitmap-to-text representation)