Re: RISCV Vector unit disabled by default for new task (was Re: [PATCH v12 17/17] riscv: prctl to enable vector commands)

From: Vineet Gupta
Date: Thu Dec 15 2022 - 13:57:14 EST




On 12/15/22 07:33, Richard Henderson wrote:
On 12/15/22 04:28, Florian Weimer via Libc-alpha wrote:
* Björn Töpel:

For SVE, it is in fact disabled by default in the kernel.  When a thread
executes the first SVE instruction, it will cause an exception, the kernel
will allocate memory for SVE state and enable TIF_SVE. Further use of SVE
instructions will proceed without exceptions.  Although SVE is disabled by
default, it is enabled automatically.  Since this is done automatically
during an exception handler, there is no opportunity for memory allocation
errors to be reported, as there are in the AMX case.

Glibc has an SVE optimized memcpy, right? Doesn't that mean that pretty
much all processes on an SVE capable system will enable SVE (lazily)? If
so, that's close to "enabled by default" (unless SVE is disabled system
wide).

Yes, see sysdeps/aarch64/multiarch/memcpy.c:

   static inline __typeof (__redirect_memcpy) *
   select_memcpy_ifunc (void)
   {
     INIT_ARCH ();
        if (sve && HAVE_AARCH64_SVE_ASM)
       {
         if (IS_A64FX (midr))
           return __memcpy_a64fx;
         return __memcpy_sve;
       }
        if (IS_THUNDERX (midr))
       return __memcpy_thunderx;
        if (IS_THUNDERX2 (midr) || IS_THUNDERX2PA (midr))
       return __memcpy_thunderx2;
        if (IS_FALKOR (midr) || IS_PHECDA (midr))
       return __memcpy_falkor;
        return __memcpy_generic;
   }
   And the __memcpy_sve implementation actually uses SVE.

If there were a prctl to select the vector width and enable the vector
extension, we'd have to pick a width in glibc anyway.

There *is* a prctl to adjust the SVE vector width, but glibc does not need to select because SVE dynamically adjusts to the currently enabled width.  The kernel selects a default width that fits within the default signal frame size.

The other thing of note for SVE is that, with the default function ABI all of the SVE state is call-clobbered, which allows the kernel to drop instead of save state across system calls.  (There is a separate vector function call ABI when SVE types are used.)

For the RV psABI, it is similar - all V regs are caller-saved/call-clobbered [1] and syscalls are not required to preserve V regs [2]
However last I checked ARM documentation the ABI doc seemed to suggest that some (parts) of the SVE regs are callee-saved [3]


So while strcpy may enable SVE for the thread, the next syscall may disable it again.

Next syscall could trash them, but will it disable SVE ? Despite syscall/function-call clobbers, using V in tight loops such as mem*/str* still is a win.


[1] https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-cc.adoc
[2] https://github.com/riscv/riscv-v-spec/blob/master/calling-convention.adoc
[3] https://github.com/ARM-software/abi-aa/blob/2982a9f3b512a5bfdc9e3fea5d3b298f9165c36b/aapcs64/aapcs64.rst#the-base-procedure-call-standard
Sec 6.1.3 ".... In other cases it need only preserve the low 64 bits of z8-z15"