Re: [PATCH 02/19] riscv: cpufeature: Fix thead vector hwcap removal
From: Conor Dooley
Date: Fri Apr 12 2024 - 15:26:29 EST
On Fri, Apr 12, 2024 at 11:46:21AM -0700, Charlie Jenkins wrote:
> On Fri, Apr 12, 2024 at 07:38:04PM +0100, Conor Dooley wrote:
> > On Fri, Apr 12, 2024 at 10:04:17AM -0700, Evan Green wrote:
> > > On Fri, Apr 12, 2024 at 3:26 AM Conor Dooley <conor.dooley@xxxxxxxxxxxxx> wrote:
> > > >
> > > > On Thu, Apr 11, 2024 at 09:11:08PM -0700, Charlie Jenkins wrote:
> > > > > The riscv_cpuinfo struct that contains mvendorid and marchid is not
> > > > > populated until all harts are booted which happens after the DT parsing.
> > > > > Use the vendorid/archid values from the DT if available or assume all
> > > > > harts have the same values as the boot hart as a fallback.
> > > > >
> > > > > Fixes: d82f32202e0d ("RISC-V: Ignore V from the riscv,isa DT property on older T-Head CPUs")
> > > >
> > > > If this is our only use case for getting the mvendorid/marchid stuff
> > > > from dt, then I don't think we should add it. None of the devicetrees
> > > > that the commit you're fixing here addresses will have these properties
> > > > and if they did have them, they'd then also be new enough to hopefully
> > > > not have "v" either - the issue is they're using whatever crap the
> > > > vendor shipped.
> > > > If we're gonna get the information from DT, we already have something
> > > > that we can look at to perform the disable as the cpu compatibles give
> > > > us enough information to make the decision.
> > > >
> > > > I also think that we could just cache the boot CPU's marchid/mvendorid,
> > > > since we already have to look at it in riscv_fill_cpu_mfr_info(), avoid
> > > > repeating these ecalls on all systems.
> > > >
> > > > Perhaps for now we could just look at the boot CPU alone? To my
> > > > knowledge the systems that this targets all have homogeneous
> > > > marchid/mvendorid values of 0x0.
> > >
> > > It's possible I'm misinterpreting, but is the suggestion to apply the
> > > marchid/mvendorid we find on the boot CPU and assume it's the same on
> > > all other CPUs? Since we're reporting the marchid/mvendorid/mimpid to
> > > usermode in a per-hart way, it would be better IMO if we really do
> > > query marchid/mvendorid/mimpid on each hart. The problem with applying
> > > the boot CPU's value everywhere is if we're ever wrong in the future
> > > (ie that assumption doesn't hold on some machine), we'll only find out
> > > about it after the fact. Since we reported the wrong information to
> > > usermode via hwprobe, it'll be an ugly userspace ABI issue to clean
> > > up.
> >
> > You're misinterpreting, we do get the values on all individually as
> > they're brought online. This is only used by the code that throws a bone
> > to people with crappy vendor dtbs that put "v" in riscv,isa when they
> > support the unratified version.
>
> Not quite,
Remember that this patch stands in isolation and the justification given
in your commit message does not mention anything other than fixing my
broken patch.
> the alternatives are patched before the other cpus are
> booted, so the alternatives will have false positives resulting in
> broken kernels.
Over-eagerly disabling vector isn't going to break any kernels and
really should not break a behaving userspace either.
Under-eagerly disabling it (in a way that this approach could solve) is
only going to happen on a system where the boot hart has non-zero values
and claims support for v but a non-boot hart has zero values and
claims support for v but actually doesn't implement the ratified version.
If the boot hart doesn't support v, then we currently disable the
extension as only homogeneous stuff is supported by Linux. If the boot
hart claims support for "v" but doesn't actually implement the ratified
version neither the intent of my original patch nor this fix for it are
going to help avoid a broken kernel.
I think we do have a problem if the boot cpu having some erratum leads
to the kernel being patched in a way that does not work for the other
CPUs on the system, but I don't think this series addresses that sort of
issue at all as you'd be adding code to the pi section if you were fixing
it. I also don't think we should be making pre-emptive changes to the
errata patching code either to solve that sort of problem, until an SoC
shows up where things don't work.
Cheers,
Conor.
Attachment:
signature.asc
Description: PGP signature