Re: [Patch] memory: tegra: Skip SID override from Guest VM

From: Thierry Reding
Date: Tue Feb 06 2024 - 12:08:52 EST

Next message: Joe Damato: "Re: [PATCH net-next] eth: mlx5: link NAPI instances to queues and IRQs"
Previous message: Ashay Jaiswal: "Re: [PATCH v2 8/8] sched/pelt: Introduce PELT multiplier"
In reply to: Marc Zyngier: "Re: [Patch] memory: tegra: Skip SID override from Guest VM"
Next in thread: Marc Zyngier: "Re: [Patch] memory: tegra: Skip SID override from Guest VM"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Tue Feb 6, 2024 at 3:54 PM CET, Marc Zyngier wrote:
> On Tue, 06 Feb 2024 14:07:10 +0000,
> "Thierry Reding" <thierry.reding@xxxxxxxxx> wrote:
> >
> > [1 <text/plain; UTF-8 (quoted-printable)>]
> > On Tue Feb 6, 2024 at 1:53 PM CET, Marc Zyngier wrote:
> > > On Tue, 06 Feb 2024 12:28:27 +0000, Jon Hunter <jonathanh@xxxxxxxxxx> wrote:
> > > > On 06/02/2024 12:17, Marc Zyngier wrote:
> > [...]
> > > > > - My own tegra186 HW doesn't have VHE, since it is ARMv8.0, and this
> > > > > helper will always return 'false'. How could this result in
> > > > > something that still works? Can I get a free CPU upgrade?
> > > >
> > > > I thought this API just checks to see if we are in EL2?
> > >
> > > It does. And that's the problem. On ARMv8.0, we run the Linux kernel
> > > at EL1. Tegra186 is ARMv8.0 (Denver + A57). So as written, this change
> > > breaks the very platform it intends to support.
> >
> > To clarify, the code that accesses these registers is shared across
> > Tegra186 and later chips. Tegra194 and later do support ARMv8.1 VHE.
>
> But even on these machines that are VHE-capable, not running at EL2
> doesn't mean we're running as a guest. The user can force the kernel
> to stick to EL1, using a command-line option such as kvm-arm.mode=nvhe
> which will force the kernel to stay at EL1 while deploying KVM at EL2.
>
> > Granted, if it always returns false on Tegra186 that's not what we
> > want.
>
> I'm glad we agree here.
>
> > > > > - If you assign this device to a VM and that the hypervisor doesn't
> > > > > correctly virtualise it, then it is a different device and you
> > > > > should simply advertise it something else. Or even better, fix your
> > > > > hypervisor.
> > > >
> > > > Sumit can add some more details on why we don't completely disable the
> > > > device for guest OSs.
> > >
> > > It's not about disabling it. It is about correctly supporting it
> > > (providing full emulation for it), or advertising it as something
> > > different so that SW can handle it differently.
> >
> > It's really not a different device. It's exactly the same device except
> > that accessing some registers isn't permitted. We also can't easily
> > remove parts of the register region from device tree because these are
> > intermixed with other registers that we do want access to.
>
> But that's the definition of being a different device. It has a
> different programming interface, hence it is different. The fact that
> it is the same HW block mediated by a hypervisor doesn't really change
> that.

The programming model isn't really different in these cases, but rather
restricted. I think a compatible string is a suboptimal way to describe
this.

> > > Poking into the internals of how the kernel is booted for a driver
> > > that isn't tied to the core architecture (because it would need to
> > > access system registers, for example) is not an acceptable outcome.
> >
> > So what would be the better option? Use a different compatible string to
> > make the driver handle the device differently? Or adding a custom
> > property to the device tree node to mark this as running in a
> > virtualized environment?
>
> A different compatible string would be my preferred option. An extra
> property would work as well. As far as I am concerned, these two
> options are the right way to express the fact that you have something
> that isn't quite like the real thing.

Coincidentally there's another discussion with a lot of similarities
regarding simulated platforms. For these it's usually less about the
register set being restricted and more about certain quirks that are
needed which will not ultimately be necessary for silicon.

This could be a timeout that's longer in simulation, or it could be
certain programming that would be needed in silicon but isn't necessary
or functional in simulation (think I/O calibration, that sort of thing).
One could argue that these are also different devices when in simulation
but they really aren't. They're more like an approximation of the actual
device that will be in silicon chips.

Another problem that both of the cases have in common is that they are
parameters that usually apply to the entire system. For some devices it
is easier to parameterize via DT (for example for certain devices we
have bindings with special register regions that are only available in
host OS mode), but for others this may not be true. Adding extra
compatible strings for virtualization/simulation is going to get quite
complex very quickly if we need to differentiate between all of these
scenarios.

> > Perhaps we can reuse the top-level hypervisor node? That seems to only
> > ever have been used for Xen on 32-bit ARM, so not sure if that'd still
> > be appropriate.
>
> I'd shy away from this. You would be deriving properties from a
> hypervisor implementation, instead of expressing those properties
> directly. In my experience, the direct method is always preferable.

I would generally agree. However, I think especially the compatible
string solution could turn very ugly for this. If we express these
properties via compatible strings we may very well end up with many
different compatible strings to cover all cases.

Say you've got one hypervisor that changes the programming model in a
certain way and a second hypervisor that constrains in a different way.
Do we now need one compatible string for each hypervisor? Do we add
compatible strings for each restriction and have potentially very long
compatible string lists? Separate properties would work slightly better
for this.

There are some cases where we can use register contents to determine
what the OS is allowed to do, but these registers don't exist for all HW
blocks. We may be able to get more added to new chips, but we obviously
can't retroactively add them for existing ones.

A central node or property would at least allow broad parameterization.
I would hope that at least hypervisor implementations don't vary too
much in terms of what they restrict and what they don't, so perhaps it
wouldn't be that bad. Perhaps that's also overly optimistic.

Thierry

Attachment: signature.asc
Description: PGP signature

Next message: Joe Damato: "Re: [PATCH net-next] eth: mlx5: link NAPI instances to queues and IRQs"
Previous message: Ashay Jaiswal: "Re: [PATCH v2 8/8] sched/pelt: Introduce PELT multiplier"
In reply to: Marc Zyngier: "Re: [Patch] memory: tegra: Skip SID override from Guest VM"
Next in thread: Marc Zyngier: "Re: [Patch] memory: tegra: Skip SID override from Guest VM"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]