Re: [RISC-V] [tech-j-ext] [RFC PATCH 5/9] riscv: Split per-CPU and per-thread envcfg bits

From: Deepak Gupta
Date: Fri Mar 22 2024 - 13:14:09 EST


On Thu, Mar 21, 2024 at 5:13 PM Samuel Holland
<samuel.holland@xxxxxxxxxx> wrote:
>
> On 2024-03-19 11:39 PM, Deepak Gupta wrote:
> >>>> --- a/arch/riscv/include/asm/switch_to.h
> >>>> +++ b/arch/riscv/include/asm/switch_to.h
> >>>> @@ -69,6 +69,17 @@ static __always_inline bool has_fpu(void) { return false; }
> >>>> #define __switch_to_fpu(__prev, __next) do { } while (0)
> >>>> #endif
> >>>>
> >>>> +static inline void sync_envcfg(struct task_struct *task)
> >>>> +{
> >>>> + csr_write(CSR_ENVCFG, this_cpu_read(riscv_cpu_envcfg) | task->thread.envcfg);
> >>>> +}
> >>>> +
> >>>> +static inline void __switch_to_envcfg(struct task_struct *next)
> >>>> +{
> >>>> + if (riscv_cpu_has_extension_unlikely(smp_processor_id(), RISCV_ISA_EXT_XLINUXENVCFG))
> >>>
> >>> I've seen `riscv_cpu_has_extension_unlikely` generating branchy code
> >>> even if ALTERNATIVES was turned on.
> >>> Can you check disasm on your end as well. IMHO, `entry.S` is a better
> >>> place to pick up *envcfg.
> >>
> >> The branchiness is sort of expected, since that function is implemented by
> >> switching on/off a branch instruction, so the alternate code is necessarily a
> >> separate basic block. It's a tradeoff so we don't have to write assembly code
> >> for every bit of code that depends on an extension. However, the cost should be
> >> somewhat lowered since the branch is unconditional and so entirely predictable.
> >>
> >> If the branch turns out to be problematic for performance, then we could use
> >> ALTERNATIVE directly in sync_envcfg() to NOP out the CSR write.
> >
> > Yeah I lean towards using alternatives directly.
>
> One thing to note here: we can't use alternatives directly if the behavior needs
> to be different on different harts (i.e. a subset of harts implement the envcfg
> CSR). I think we need some policy about which ISA extensions are allowed to be
> asymmetric across harts, or else we add too much complexity.

As I've responded on the same thread . We are adding too much
complexity by assuming
that heterogeneous ISA exists (which it doesn't today). And even if it
exists, it wouldn't work.
Nobody wants to spend a lot of time figuring out which harts have
which ISA and which
packages are compiled with which ISA. Most of the end users do `sudo
apt get install blah blah`
And then expect it to just work. It doesn't work for other
architectures and even when someone
tried, they had to disable certain ISA features to make sure that all
cores have the same ISA feature
(search AVX12 Intel Alder Lake Disable).

>
> Regards,
> Samuel
>