Re: [PATCH 02/19] riscv: cpufeature: Fix thead vector hwcap removal

From: Charlie Jenkins
Date: Fri Apr 12 2024 - 16:34:57 EST


On Fri, Apr 12, 2024 at 08:26:12PM +0100, Conor Dooley wrote:
> On Fri, Apr 12, 2024 at 11:46:21AM -0700, Charlie Jenkins wrote:
> > On Fri, Apr 12, 2024 at 07:38:04PM +0100, Conor Dooley wrote:
> > > On Fri, Apr 12, 2024 at 10:04:17AM -0700, Evan Green wrote:
> > > > On Fri, Apr 12, 2024 at 3:26 AM Conor Dooley <conor.dooley@xxxxxxxxxxxxx> wrote:
> > > > >
> > > > > On Thu, Apr 11, 2024 at 09:11:08PM -0700, Charlie Jenkins wrote:
> > > > > > The riscv_cpuinfo struct that contains mvendorid and marchid is not
> > > > > > populated until all harts are booted which happens after the DT parsing.
> > > > > > Use the vendorid/archid values from the DT if available or assume all
> > > > > > harts have the same values as the boot hart as a fallback.
> > > > > >
> > > > > > Fixes: d82f32202e0d ("RISC-V: Ignore V from the riscv,isa DT property on older T-Head CPUs")
> > > > >
> > > > > If this is our only use case for getting the mvendorid/marchid stuff
> > > > > from dt, then I don't think we should add it. None of the devicetrees
> > > > > that the commit you're fixing here addresses will have these properties
> > > > > and if they did have them, they'd then also be new enough to hopefully
> > > > > not have "v" either - the issue is they're using whatever crap the
> > > > > vendor shipped.
> > > > > If we're gonna get the information from DT, we already have something
> > > > > that we can look at to perform the disable as the cpu compatibles give
> > > > > us enough information to make the decision.
> > > > >
> > > > > I also think that we could just cache the boot CPU's marchid/mvendorid,
> > > > > since we already have to look at it in riscv_fill_cpu_mfr_info(), avoid
> > > > > repeating these ecalls on all systems.
> > > > >
> > > > > Perhaps for now we could just look at the boot CPU alone? To my
> > > > > knowledge the systems that this targets all have homogeneous
> > > > > marchid/mvendorid values of 0x0.
> > > >
> > > > It's possible I'm misinterpreting, but is the suggestion to apply the
> > > > marchid/mvendorid we find on the boot CPU and assume it's the same on
> > > > all other CPUs? Since we're reporting the marchid/mvendorid/mimpid to
> > > > usermode in a per-hart way, it would be better IMO if we really do
> > > > query marchid/mvendorid/mimpid on each hart. The problem with applying
> > > > the boot CPU's value everywhere is if we're ever wrong in the future
> > > > (ie that assumption doesn't hold on some machine), we'll only find out
> > > > about it after the fact. Since we reported the wrong information to
> > > > usermode via hwprobe, it'll be an ugly userspace ABI issue to clean
> > > > up.
> > >
> > > You're misinterpreting, we do get the values on all individually as
> > > they're brought online. This is only used by the code that throws a bone
> > > to people with crappy vendor dtbs that put "v" in riscv,isa when they
> > > support the unratified version.
> >
> > Not quite,
>
> Remember that this patch stands in isolation and the justification given
> in your commit message does not mention anything other than fixing my
> broken patch.

Fixing the patch in the simplest sense would be to eagerly get the
mvendorid/marchid without using the cached version. But this assumes
that all harts have the same mvendorid/marchid. This is not something
that I am strongly attached to. If it truly is detrimental to Linux to
allow a user a way to specify different vendorids for different harts
then I will remove that code.

- Charlie

>
> > the alternatives are patched before the other cpus are
> > booted, so the alternatives will have false positives resulting in
> > broken kernels.
>
> Over-eagerly disabling vector isn't going to break any kernels and
> really should not break a behaving userspace either.
> Under-eagerly disabling it (in a way that this approach could solve) is
> only going to happen on a system where the boot hart has non-zero values
> and claims support for v but a non-boot hart has zero values and
> claims support for v but actually doesn't implement the ratified version.
> If the boot hart doesn't support v, then we currently disable the
> extension as only homogeneous stuff is supported by Linux. If the boot
> hart claims support for "v" but doesn't actually implement the ratified
> version neither the intent of my original patch nor this fix for it are
> going to help avoid a broken kernel.
>
> I think we do have a problem if the boot cpu having some erratum leads
> to the kernel being patched in a way that does not work for the other
> CPUs on the system, but I don't think this series addresses that sort of
> issue at all as you'd be adding code to the pi section if you were fixing
> it. I also don't think we should be making pre-emptive changes to the
> errata patching code either to solve that sort of problem, until an SoC
> shows up where things don't work.
> Cheers,
> Conor.