Re: [PATCH v3 7/8] riscv: Add parameter for skipping access speed tests

From: Alexandre Ghiti
Date: Tue Mar 18 2025 - 08:58:32 EST

Next message: Michal Koutný: "Re: [PATCH 1/2] mm: vmscan: Split proactive reclaim statistics from direct reclaim statistics"
Previous message: Jason Gunthorpe: "Re: [PATCH v3 1/1] KVM: arm64: Allow cacheable stage 2 mapping using VMA flags"
In reply to: Andrew Jones: "Re: [PATCH v3 7/8] riscv: Add parameter for skipping access speed tests"
Next in thread: Andrew Jones: "Re: [PATCH v3 7/8] riscv: Add parameter for skipping access speed tests"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 18/03/2025 13:45, Andrew Jones wrote:

On Tue, Mar 18, 2025 at 01:13:18PM +0100, Alexandre Ghiti wrote:

On 18/03/2025 09:48, Andrew Jones wrote:

On Mon, Mar 17, 2025 at 03:39:01PM +0100, Alexandre Ghiti wrote:

Hi Drew,

On 04/03/2025 13:00, Andrew Jones wrote:

Allow skipping scalar and vector unaligned access speed tests. This
is useful for testing alternative code paths and to skip the tests in
environments where they run too slowly. All CPUs must have the same
unaligned access speed.

I'm not a big fan of the command line parameter, this is not where we should
push uarch decisions because there could be many other in the future, the
best solution to me should be in DT/ACPI and since the DT folks, according
to Palmer, shut down this solution, it remains using an extension.

I have been reading a bit about unaligned accesses. Zicclsm was described as
"Even though mandated, misaligned loads and stores might execute extremely
slowly. Standard software distributions should assume their existence only
for correctness, not for performance." in rva20/22 but *not* in rva23. So
what about using this "hole" and consider that a platform that *advertises*
Zicclsm means its unaligned accesses are fast? After internal discussion, It
actually does not make sense to advertise Zicclsm if the platform accesses
are slow right?

This topic pops up every so often, including in yesterday's server
platform TG call. In that call, and, afaict, every other time it has
popped up, the result is to reiterate that ISA extensions never say
anything about performance. So, Zicclsm will never mean fast and we
won't likely be able to add any extension that does.

Ok, I should not say "fast". Usually, when an extension is advertised by a
platform, we don't question its speed (zicboz, zicbom...etc), we simply use
it and it's up to the vendor to benchmark its implementation and act
accordingly (i.e. do not set it in the isa string).

arm64 for example considers that armv8 has fast unaligned accesses and can
then enable HAVE_EFFICIENT_ALIGNED_ACCESS in the kernel, even though some
uarchs are slow. Distros will very likely use rva23 as baseline so they will
enable Zicclsm which would allow us to take advantage of this too, without
this, we lose a lot of perf improvement in the kernel, see
https://lore.kernel.org/lkml/20231225044207.3821-1-jszhang@xxxxxxxxxx/.

Or we could have a new named feature for this, even though it's weird to
have a named feature which would basically mean "Zicclsm is fast". We don't
have, for example, a named feature to say "Zicboz is fast" but given the
vague wording in the profile spec, maybe we can ask for one in that case?

Sorry for the late review and for triggering this debate...

No problem, let's try to pick the best option. I'll try listing all the
options and there pros/cons.

1. Leave as is, which is to always probe
pro: Nothing to do
con: Not ideal in all environments

2. New DT/ACPI description
pro: Describing whether or not misaligned accesses are implemented in
HW (which presumably means fast) is something that should be done
in HW descriptions
con: We'll need to live with probing until we can get the descriptions
defined, which may be never if there's too much opposition

3. Command line
pro: Easy and serves its purpose, which is to skip probing in the
environments where probing is not desired
con: Yet another command line option (which we may want to deprecate
someday)

4. New ISA extension
pro: Easy to add to HW descriptions
con: Not likely to get it through ratification

5. New SBI FWFT feature
pro: Probably easier to get through ratification than an ISA extension
con: Instead of probing, kernel would have to ask SBI -- would that
even be faster? Will all the environments that want to skip
probing even have a complete SBI?

6. ??

So what about:

7. New enum value describing the performance as "FORCED" or "HW" (or
anything better)
pro: We only use the existing Zicclsm
con: It's not clear that the accesses are fast but it basically says to
SW "don't think too much, I'm telling you that you can use it", up to us to
describe this correctly for users to understand.

But Zicclsm doesn't mean misaligned accesses are in HW, it just means
they're not going to explode.

They never explode since if they are not supported by the HW, we rely on S-mode emulation already.

We'd still need the probing to find out
if the accesses are emulated (slow) or hw (fast). We at least want to
know the answer to that question because we advertise it to userspace
through hwprobe.

(BTW, another pro of the command line is that it can be used to test
both slow and fast paths without recompiling.)

Thanks,
drew

Next message: Michal Koutný: "Re: [PATCH 1/2] mm: vmscan: Split proactive reclaim statistics from direct reclaim statistics"
Previous message: Jason Gunthorpe: "Re: [PATCH v3 1/1] KVM: arm64: Allow cacheable stage 2 mapping using VMA flags"
In reply to: Andrew Jones: "Re: [PATCH v3 7/8] riscv: Add parameter for skipping access speed tests"
Next in thread: Andrew Jones: "Re: [PATCH v3 7/8] riscv: Add parameter for skipping access speed tests"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]