Re: [PATCH 0/2] x86/kvm: Force legacy PCI hole as WB under SNP/TDX

From: Sean Christopherson
Date: Mon Feb 03 2025 - 19:27:14 EST

Next message: Peilin Ye: "Re: [PATCH bpf-next v1 7/8] selftests/bpf: Add selftests for load-acquire and store-release instructions"
Previous message: Abhinav Kumar: "Re: [PATCH v6 3/7] drm/msm/hdmi: make use of the drm_connector_hdmi framework"
In reply to: Edgecombe, Rick P: "Re: [PATCH 0/2] x86/kvm: Force legacy PCI hole as WB under SNP/TDX"
Next in thread: Edgecombe, Rick P: "Re: [PATCH 0/2] x86/kvm: Force legacy PCI hole as WB under SNP/TDX"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Mon, Feb 03, 2025, Rick P Edgecombe wrote:
> On Mon, 2025-02-03 at 12:33 -0800, Sean Christopherson wrote:
> > > Since there is no upstream KVM TDX support yet, why isn't it an option to
> > > still
> > > revert the EDKII commit too? It was a relatively recent change.
> >
> > I'm fine with that route too, but it too is a band-aid. Relying on the
> > *untrusted*
> > hypervisor to essentially communicate memory maps is not a winning strategy.
> >
> > > To me it seems that the normal KVM MTRR support is not ideal, because it is
> > > still lying about what it is doing. For example, in the past there was an
> > > attempt to use UC to prevent speculative execution accesses to sensitive
> > > data.
> > > The KVM MTRR support only happens to work with existing guests, but not all
> > > possible MTRR usages.
> > >
> > > Since diverging from the architecture creates loose ends like that, we could
> > > instead define some other way for EDKII to communicate the ranges to the
> > > kernel.
> > > Like some simple KVM PV MSRs that are for communication only, and not
> >
> > Hard "no" to any PV solution. This isn't KVM specific, and as above, bouncing
> > through the hypervisor to communicate information within the guest is asinine,
> > especially for CoCo VMs.
>
> Hmm, right.
>
> So the other options could be:
>
> 1. Some TDX module feature to hold the ranges:
> - Con: Not shared with AMD
>
> 2. Re-use MTRRs for the communication, revert changes in guest and edk2:

Thinking more about how EDK2 is consumed downstream, I think reverting the EDK2
changes is necessary regardless of what happens in the kernel. Or at the least,
somehow communicate to EDK2 users that ingesting those changes is a bad idea
unless the kernel has also been updated.

AFAIK, Bring Your Own Firmware[*] isn't widely adopted, which means that the CSP
is shipping the firmware. And shipping OVMF/EDK2 with the "ignores MTRRs" code
will cause problems for guests without commit 8e690b817e38 ("x86/kvm: Override
default caching mode for SEV-SNP and TDX"). Since the host doesn't control the
guest kernel, there's no way to know if deploying those EDK2 changes is safe.

[*] https://kvm-forum.qemu.org/2024/BYOF_-_KVM_Forum_2024_iWTioIP.pdf

> - Con: Creating more half support, when it's technically not required
> - Con: Still bouncing through the hypervisor

I assume by "Re-use MTRRs for the communication" you also mean updating the guest
to address the "everything is UC!" flaw, otherwise another con is:

- Con: Doesn't address the performance issue with TDX guests "using" UC
memory by default (unless there's yet more enabled).

Presumably that can be accomplished by simply skipping the CR0.CD toggling, and
doing MTRR stuff as nonrmal?

> - Pro: Design and code is clear
>
> 3. Create some new architectural definition, like a bit that means "MTRRs don't
> actually work:
> - Con: Takes a long time, need to get agreement
> - Con: Still bouncing through the hypervisor

Not for KVM guests. As I laid out in my bug report, it's safe to assume MTRRs
don't actually affect the memory type when running under KVM.

FWIW, PAT doesn't "work" on most KVM Intel setups either, because of misguided
KVM code that resulted in "Ignore Guest PAT" being set in all EPTEs for the
overwhelming majority of guests. That's not desirable long term because it
prevents the guest from using WC (via PAT) in situations where doing so is needed
for performance and/or correctness.

> - Pro: More pure solution

MTRRs "not working" is a red herring. The problem isn't that MTRRs don't work,
it's that the kernel is (somewhat unknowingly) using MTRRs as a crutch to get the
desired memtype for devices. E.g. for emulated MMIO, MTRRs _can't_ be virtualized,
because there's never a valid mapping, i.e. there is no physical memory and thus
no memtype. In other words, under KVM guests (and possibly other hypervisors),
MTRRs end up being nothing more than a communication channel between guest firmware
and the kernel.

The gap for CoCo VMs is that using MTRRs is undesirable because they are controlled
by the untrusted host. But that's largely a future problem, unless someone has a
clever way to fix the kernel mess.

> 4. Do this series:
> - Pro: Looks ok to me
> - Cons: As explained in the patches, it's a bit hacky.
> - Cons: Could there be more cases than the legacy PCI hole?
>
> I would kind of like to see something like 3, but 2 or 4 seem the only feasible
> ones if we want to resolve this soon.

Next message: Peilin Ye: "Re: [PATCH bpf-next v1 7/8] selftests/bpf: Add selftests for load-acquire and store-release instructions"
Previous message: Abhinav Kumar: "Re: [PATCH v6 3/7] drm/msm/hdmi: make use of the drm_connector_hdmi framework"
In reply to: Edgecombe, Rick P: "Re: [PATCH 0/2] x86/kvm: Force legacy PCI hole as WB under SNP/TDX"
Next in thread: Edgecombe, Rick P: "Re: [PATCH 0/2] x86/kvm: Force legacy PCI hole as WB under SNP/TDX"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]