Re: RISCV Vector unit disabled by default for new task (was Re: [PATCH v12 17/17] riscv: prctl to enable vector commands)

From: Darius Rad
Date: Tue Dec 13 2022 - 11:43:25 EST


On Fri, Dec 09, 2022 at 11:42:19AM -0800, Vineet Gupta wrote:
>
> But keeping the V unit disabled by default and using prctl as a gatekeeper
> to enable it feels unnecessary and tedious.
> Here's my reasoning below (I'm collating comments from prior msgs as well).

Please reference the previous discussion [1] which has covered topics that
have not been discussed recently.

[1] https://lists.infradead.org/pipermail/linux-riscv/2021-September/thread.html#8361

>
> 1. Doesn't it add another userspace ABI which is already a headache for this
> feature. And that needs to be built into not just libc but potentially other
> runtimes too. Even after implemention there will be an interim pain as the
> new prctl takes time to trickle down into tooling and headers. Besides the
> new stuff will never be compatible with older kernel but that is a minor
> point since older kernel not supporting V is a deal breaker anyways.
>

None of this is relevant because there is no existing user space ABI for
vector. It is being invented now. If this is done poorly, for example, by
missing this opportunity to add a mechanism for user space to request use
of the vector extension, it will be much more painful to add later.

> 2. People want the prctl gatekeeping for ability to gracefully handle memory
> allocation failure for the extra V-state within kernel. But that is only
> additional 4K (for typical 128 wide V regs) per task.

But vector state scales up to as much as 256k. Are you suggesting that
there is no possibility that future systems would support more than
VLEN=128?

> If that is failing,
> the system is not doing well anyways. Besides it is not an issue at all
> since ENOMEM in clone/execve for the additional space should handle the
> failure anyways. Only very sophisticated apps would downgrade from executing
> V to Scalar code if the prctl failed.

This seems unlikely. As vector support does not exist in any present
hardware, and the vector extension is only optional in the RISC-V profiles
that include it, I would think that it is almost certain that any
application that supports V would have a fallback path for when the V
extension is not available.


Another motivation for requiring that user space request use of the vector
extension is that the vector unit may be shared between multiple harts
and/or have power or performance implications in the system. By requiring
that user space request access, it allows the system to decline that
access, and user space can handle this gracefully.

If we add a mechanism for user space to request access to the vector
extension, and it turns out that it was unnecessary, the worst that has
happened is a slight inconvenience.

If we do not add such a mechanism, and later determine that it is
necessary, we have a much greater problem. There would be backward
compatibility issues with the ABI, and such a mechanism could probably not
be fully implemented at all due to the desire to support potential future
legacy vector code.

This is a similar problem on x86. According to some, it was handled poorly
with AVX-512 by missing this type of mechanism, and improved with AMX [2].
There is opportunity to learn from that experience and do things better on
RISC-V.

[2] https://lore.kernel.org/lkml/87k0ntazyn.ffs@xxxxxxxxxxxxxxxxxxxxxxx/


// darius