Re: ARM SVE ABI: kernel dropping SVE/SME state on syscalls

From: Mark Rutland
Date: Tue Apr 02 2024 - 14:11:59 EST


On Wed, Mar 27, 2024 at 05:30:00PM -0700, Vineet Gupta wrote:
> Hi Will, Marc,
>
> In the RISC-V land we are hitting an issue and need some help
> understanding the SVE ABI about dropping the state on syscalls (and its
> implications etc - in hindsight)
>
> If I'm reading the arm64 code correctly, SVE state is unconditionally
> (for any syscall whatsoever) dropped in following code path:
>
> el0_svc
> fp_user_discard
>
> The RISC-V Vector ABI mandates something similar and kernel implements
> something similar.
>
> 2023-06-29 9657e9b7d253 riscv: Discard vector state on syscalls
>
> However in recent testing with RISC-V vector builds we are running into
> an issue when this just doesn't work.
>
> Just for some background, RISC-V vector instructions relies on
> additional state in a VTYPE register which is setup using an apriori
> VSETVLI insn.
> So consider the following piece of code:
>
> 3ff80: cc787057 vsetivli zero,16,e8,mf2,ta,ma
> <-- sets up VTYPE
> 3ff84: 44d8 lw a4,12(s1)
> 3ff86: 449c lw a5,8(s1)
> 3ff88: 06f75563 bge a4,a5,3fff2
> 3ff8c: 02010087 vle8.v v1,(sp)
> 3ff90: 020980a7 vse8.v v1,(s3) <-- Vector store
> instruction
> Here's the sequence of events that's causing the issue
>
> 1. The vector store instruction (in say bash) takes a page fault, enters
> kernel.
> 2. In PF return path, a SIGCHLD signal is pending (a bash sub-shell
> which exited, likely on different cpu).

At this point, surely you need to save the VTYPE into a sigframe before
delivering the signal?

> 3. kernel resumes in userspace signal handler which ends up making an
> rt_sigreturn syscall - and which as specified discards the V state (and
> makes VTYPE reg invalid).

The state is discarded at syscall entry, but rt_sigreturn() runs *after* the
discard. If you saved the original VTYPE prior to delivering the signal, it
should be able to restore it regardless of whether it'd be clobbered at syscall
entry.

Surely you *must* save/restore VTYPE in the signal frame? Otherwise the signal
handler can't make any syscall whatsoever, or it's responsible for saving and
restoring VTYPE in userspace, which doesn't seem right.

> 4. When sigreturn finally returns to original Vector store instruction,
> invalid VTYPE triggers an Illegal instruction which causes a SIGILL (as
> state was discarded above).
>
> So there is no way dropping syscall state would work here.

As above, I don't think that's quite true. It sounds to me like that the actual
bug is that you don't save+restore VTYPE in the signal frame?

> How do you guys handle this for SVE/SME ? One way would be to not do the
> discard in rt_sigreturn codepath, but I don't see that - granted I'm not
> too familiar with arch/arm64/*/**

IIUC this works on arm64 because we'll save all the original state when we
deliver the signal, then restore that state *after* entry to the rt_sigreturn()
syscall.

I can go dig into that tomorrow, but I don't see how this can work unless we
save *all* state prior to delivering the signal, and restoring *all* that state
from the sigframe.

> Other thing I wanted to ask is, have there been any perf implications of
> this ABI decision: as in if this was other way around, userspace (and/or
> compilers) could potentially leverage the fact that SVE/SME state would
> still be valid past a syscall - and won't have to reload/resetup etc.

I believe Mark Brown has made some changes recently to try to avoid some of
that impact. He might be able to comment on that.

Mark.