Re: [RESEND PATCH v4 8/8] arm64: Allow 64-bit tasks to invoke compat syscalls

From: Steven Price
Date: Wed May 19 2021 - 11:30:40 EST


On 19/05/2021 00:51, Amanieu d'Antras wrote:
> On Tue, May 18, 2021 at 2:03 PM Arnd Bergmann <arnd@xxxxxxxxxx> wrote:
>> I'm still undecided about this approach. It is an easy way to expose the 32-bit
>> ABIs, it mostly copies what x86-64 already does with 32-bit syscalls and
>> it doesn't expose a lot of attack surface that isn't already exposed to normal
>> 32-bit tasks running compat mode.
>>
>> On the other hand, exposing the entire aarch32 syscall set seems both
>> too broad and not broad enough: Half of the system calls behave the
>> exact same way in native and compat mode, so they wouldn't need to
>> be exposed like this, a lot of others are trivially emulated in user space
>> by calling the native versions. The syscalls that are actually hard to do
>> such as ioctl() or the signal handling will work for aarch32 emulation, but
>> they are still insufficient to correctly emulate other 32-bit architectures
>> that have a slightly different ABI. This means the interface is a fairly good
>> fit for Tango, but much less so for FEX.
>>
>> It's also worth pointing out that this approach has a few things in common
>> with Yury's ilp32 tree at https://github.com/norov/linux/tree/ilp32-5.2
>> Unlike the x86 x32 mode, that port however does not allow calling compat
>> syscalls from normal 64-bit tasks but rather keys the syscall entry point
>> off the executable format., which wouldn't work here. It also uses the
>> asm-generic system call numbers instead of the arm32 syscall numbers.
>>
>> I assume you have already considered or tried the alternative approach of
>> only adding a minimal set of syscalls that are needed for the emulation.
>> Having a way to limit the address space for mmap() and similar
>> system calls sounds like a generally useful addition, and having an
>> extended variant of ioctl() that lets you pick the target ABI (arm32, x86-32,
>> ...) on supported drivers would probably be better for FEX. Can you
>> explain the tradeoffs that led you towards duplicating the syscall
>> entry points instead?
>
> Tango needs the entire compat ABI to be exposed to support seccomp for
> translated AArch32 processes. Here's how this works:
>
> 1. When a translated process installs a seccomp filter, Tango injects
> a prefix into the seccomp program which effectively does:
> if (arch == AUDIT_ARCH_AARCH64) {
> // 64-bit syscalls used by Tango for internal operations
> if (syscall_in_tango_whitelist(nr))
> return SECCOMP_RET_ALLOW;
> }
> // continue to user-supplied seccomp program
>
> 2. When Tango performs a 32-bit syscall on behalf of the translated
> process, the seccomp filter will see a syscall with AUDIT_ARCH_ARM and
> the compat syscall number. This allows the user-supplied seccomp
> filter to behave exactly as if it was running in a native AArch32
> process.
>

Perhaps I'm missing something, but surely some syscalls that would be
native on 32 bit will have to be translated by Tango to 64 bit syscalls
to do the right thing? E.g. from the previous patch compat sigreturn
isn't available.

In those cases to correctly emulate seccomp, isn't Tango is going to
have to implement the seccomp filter in user space?

I guess the question comes down to how big a hole is
syscall_in_tango_whitelist() - if Tango only requires a small set of
syscalls then there is still some security benefit, but otherwise this
doesn't seem like a particularly big benefit considering you're already
going to need the BPF infrastructure in user space.

Or perhaps I'm wrong and there's some magic that makes this work in the
kernel?

Steve