Re: [PATCH v3 4/8] RISC-V: Check Zicclsm to set unaligned access speed

From: Clément Léger
Date: Mon Jul 01 2024 - 10:20:45 EST




On 01/07/2024 15:58, Conor Dooley wrote:
> On Mon, Jul 01, 2024 at 09:15:09AM +0200, Clément Léger wrote:
>>
>>
>> On 27/06/2024 23:20, Charlie Jenkins wrote:
>>> On Wed, Jun 26, 2024 at 03:39:14PM +0100, Conor Dooley wrote:
>>>> On Mon, Jun 24, 2024 at 08:49:57PM -0400, Jesse Taube wrote:
>>>>> Check for Zicclsm before checking for unaligned access speed. This will
>>>>> greatly reduce the boot up time as finding the access speed is no longer
>>>>> necessary.
>>>>>
>>>>> Signed-off-by: Jesse Taube <jesse@xxxxxxxxxxxx>
>>>>> ---
>>>>> V2 -> V3:
>>>>> - New patch split from previous patch
>>>>> ---
>>>>> arch/riscv/kernel/unaligned_access_speed.c | 26 ++++++++++++++--------
>>>>> 1 file changed, 17 insertions(+), 9 deletions(-)
>>>>>
>>>>> diff --git a/arch/riscv/kernel/unaligned_access_speed.c b/arch/riscv/kernel/unaligned_access_speed.c
>>>>> index a9a6bcb02acf..329fd289b5c8 100644
>>>>> --- a/arch/riscv/kernel/unaligned_access_speed.c
>>>>> +++ b/arch/riscv/kernel/unaligned_access_speed.c
>>>>> @@ -259,23 +259,31 @@ static int check_unaligned_access_speed_all_cpus(void)
>>>>> kfree(bufs);
>>>>> return 0;
>>>>> }
>>>>> +#else /* CONFIG_RISCV_PROBE_UNALIGNED_ACCESS */
>>>>> +static int check_unaligned_access_speed_all_cpus(void)
>>>>> +{
>>>>> + return 0;
>>>>> +}
>>>>> +#endif
>>>>>
>>>>> static int check_unaligned_access_all_cpus(void)
>>>>> {
>>>>> - bool all_cpus_emulated = check_unaligned_access_emulated_all_cpus();
>>>>> + bool all_cpus_emulated;
>>>>> + int cpu;
>>>>> +
>>>>> + if (riscv_has_extension_unlikely(RISCV_ISA_EXT_ZICCLSM)) {
>>>>> + for_each_online_cpu(cpu) {
>>>>> + per_cpu(misaligned_access_speed, cpu) = RISCV_HWPROBE_MISALIGNED_FAST;
>>>>
>>>> - const: zicclsm
>>>> description:
>>>> The standard Zicclsm extension for misaligned support for all regular
>>>> load and store instructions (including scalar and vector) but not AMOs
>>>> or other specialized forms of memory access. Defined in the
>>>> RISC-V RVA Profiles Specification.
>>>>
>>>> Doesn't, unfortunately, say anywhere there that they're actually fast :(
>>>
>>> Oh no! That is unfortunate that the ISA does not explicitly call that
>>> out, but I think that acceptable.
>>>
>>> If a vendor puts Zicclsm in their isa string, they should expect
>>> software to take advantage of misaligned accesses. FAST is our signal to
>>> tell software that they should emit misaligned accesses.
>>
>> AFAIK, Zicclsm is not even an ISA extension, simply a profile
>> specification which means that only the execution environment which
>> provides the profile support misaligned accesses (cf
>> https://lists.riscv.org/g/tech-profiles/message/56).
>
> I dunno, the specification status page used to describe it as an
> extension:
> https://wiki.riscv.org/display/HOME/Specification+Status+-+Historical
> My understanding was that these could be considered extensions, just
> like we are considering svade to be one.
>
>> . I don't think we
>> can extrapolate that the misaligned accesses will be fast at all.
>
> That is my opinion on it too. If it doesn't say "fast" and give a
> definition for what that means in the binding, then we can't assume that
> it is fast. I'm also wary of extending definitions of extensions in the
> binding, because a) I am 90% sure that people writing devicetrees don't
> care and b) it'd be a potential difference between DT and ACPI without a
> real justification (unlike the zkr or svade/svadu situations).

BTW, the profile spec [1] has a note that states the following for Zicclsm:

"Even though mandated, misaligned loads and stores might execute
extremely slowly. Standard software distributions should assume their
existence only for correctness, not for performance."

Which was also quoted in patch 1, so I guess that settles it.

Thanks,

Clément

Link:
https://github.com/riscv/riscv-profiles/blob/main/src/profiles.adoc?plain=1#L524
[1]

>
>>> This allows for a generic kernel, like the one a distro would compile, to
>>> skip the probing when booting on a system that explicitly called out
>>> that the hardware supports misaligned accesses.