Re: [RESEND PATCH v4 8/8] arm64: Allow 64-bit tasks to invoke compat syscalls
From: Amanieu d'Antras
Date: Fri May 21 2021 - 15:19:10 EST
On Fri, May 21, 2021 at 9:51 AM Steven Price <steven.price@xxxxxxx> wrote:
> >> In those cases to correctly emulate seccomp, isn't Tango is going to
> >> have to implement the seccomp filter in user space?
> >
> > I have not implemented user-mode seccomp emulation because it can
> > trivially be bypassed by spawning a 64-bit child process which runs
> > outside Tango. Even when spawning another translated process, the
> > user-mode filter will not be preserved across an execve.
>
> Clearly if you have user-mode seccomp emulation then you'd hook execve
> and either install the real BPF filter (if spawning a 64 bit child
> outside Tango) or ensure that the user-mode emulation is passed on to
> the child (if running within Tango).
Spawning another process is just an example. Fundamentally, Tango is
not intended or designed to be a sandbox around the 32-bit code. For
example, many of the newer ioctls use u64 instead of a pointer type to
avoid the need for a compat_ioctl handler. This means that such ioctls
could be abused to read/write any address in the process address
space, including the code that is performing the usermode seccomp
emulation.
> You already have a potential 'issue' here of a 64 bit process setting up
> a seccomp filter and then execve()ing a 32 bit (Tango'd) process. The
> set of syscalls needed for the system which supports AArch32 natively is
> going to be different from the syscalls needed for Tango. (Fundamentally
> this is a major limitation with the whole seccomp syscall filtering
> approach).
The specific example I had in mind here is Android which installs a
global seccomp filter on the zygote process from which app processes
are forked from. This filter is designed for mixed arm32/arm64 systems
and therefore has syscall whitelists for both AArch32 and AArch64.
This filter allows 32-bit processes to spawn 64-bit processes and
vice-versa: for example, many 32-bit apps will invoke another 32-bit
executable via system() which uses a 64-bit /system/bin/sh.
> >> I guess the question comes down to how big a hole is
> >> syscall_in_tango_whitelist() - if Tango only requires a small set of
> >> syscalls then there is still some security benefit, but otherwise this
> >> doesn't seem like a particularly big benefit considering you're already
> >> going to need the BPF infrastructure in user space.
> >
> > Currently Tango only whitelists ~50 syscalls, which is small enough to
> > provide security benefits and definitely better than not supporting
> > seccomp at all.
>
> Agreed, and I don't want to imply that this approach is necessarily
> wrong. But given that the approach of getting the kernel to do the
> compat syscall filtering is not perfect, I'm not sure in itself it's a
> great justification for needing the kernel to support all the compat
> syscalls.
I feel that exposing all compat syscalls to 64-bit processes is better
than the alternative of only exposing a subset of them. Of the top of
my head I can think of quite a few compat syscalls that cannot be
fully emulated in userspace and would need to be exposed in the
kernel:
- mmap/mremap/shmat/io_setup: anything that allocates VM space needs
to return a pointer in the low 4GB.
- ioctl: too many variants to reasonably maintain a separate compat
layer in userspace.
- getdents/lseek: ext4 uses 32-bit directory offsets for 32-bit processes.
- get_robust_list/set_robust_list: different in-memory ABI for
32/64-bit processes.
- open: don't force O_LARGEFILE for 32-bit processes.
- io_uring_create: different in-memory ABI for 32/64-bit processes.
- (and possibly many others)
Also consider the churn involved when adding a new syscall which
behaves differently in compat processes: rather than just using
in_compat_syscall() or wiring up a COMPAT_SYSCALL_DEFINE, a compat
variant of this syscall would also need to be added to the 64-bit
syscall table to support translation layers like Tango and FEX.
> One other thought: I suspect in practise there aren't actually many
> variations in the BPF programs used with seccomp. It may well be quite
> possible to convert the 32-bit syscall filtering programs to filter the
> equivalent 64-bit syscalls that Tango would use. Sadly this would be
> fragile if a program used a BPF program which didn't follow the "normal"
> pattern.
This might work for simple filters that only look at the syscall
number, but becomes much harder when the filter also inspects the
syscall arguments.