Re: [PATCH 0/2] x86/kvm: Force legacy PCI hole as WB under SNP/TDX
From: Sean Christopherson
Date: Mon Feb 03 2025 - 15:33:30 EST
On Mon, Feb 03, 2025, Rick P Edgecombe wrote:
> On Fri, 2025-01-31 at 16:50 -0800, Sean Christopherson wrote:
> > Attempt to hack around the SNP/TDX guest MTRR disaster by hijacking
> > x86_platform.is_untracked_pat_range() to force the legacy PCI hole, i.e.
> > memory from TOLUD => 4GiB, as unconditionally writeback.
> >
> > TDX in particular has created an impossible situation with MTRRs. Because
> > TDX disallows toggling CR0.CD, TDX enabling decided the easiest solution
> > was to ignore MTRRs entirely (because omitting CR0.CD write is obviously
> > too simple).
> >
> > Unfortunately, under KVM at least, the kernel subtly relies on MTRRs to
> > make ACPI play nice with device drivers. ACPI tries to map ranges it finds
> > as WB, which in turn prevents device drivers from mapping device memory as
> > WC/UC-.
> >
> > For the record, I hate this hack. But it's the safest approach I can come
> > up with. E.g. forcing ioremap() to always use WB scares me because it's
> > possible, however unlikely, that the kernel could try to map non-emulated
> > memory (that is presented as MMIO to the guest) as WC/UC-, and silently
> > forcing those mappings to WB could do weird things.
> >
> > My initial thought was to effectively revert the offending commit and
> > skip the cache disabling/enabling, i.e. the problematic CR0.CD toggling,
> > but unfortunately OVMF/EDKII has also added code to skip MTRR setup. :-(
>
> Oof. The missing context in 8e690b817e38 ("x86/kvm: Override default caching
> mode for SEV-SNP and TDX"), is that since it is impossible to virtualize MTRRs
> on TDX private memory (in the old way KVM used to do it) and there was no
> upstream support yet, there looked like an opportunity to avoid strange "happens
> to work" support that normal VMs ended up with. Instead KVM could just not
> support TDX KVM MTRRs from the beginning. So part of the thinking was that we
> could drop all TDX KVM MTRR hacks. (which I guess turned out to be wrong).
>
> Since there is no upstream KVM TDX support yet, why isn't it an option to still
> revert the EDKII commit too? It was a relatively recent change.
I'm fine with that route too, but it too is a band-aid. Relying on the *untrusted*
hypervisor to essentially communicate memory maps is not a winning strategy.
> To me it seems that the normal KVM MTRR support is not ideal, because it is
> still lying about what it is doing. For example, in the past there was an
> attempt to use UC to prevent speculative execution accesses to sensitive data.
> The KVM MTRR support only happens to work with existing guests, but not all
> possible MTRR usages.
>
> Since diverging from the architecture creates loose ends like that, we could
> instead define some other way for EDKII to communicate the ranges to the kernel.
> Like some simple KVM PV MSRs that are for communication only, and not
Hard "no" to any PV solution. This isn't KVM specific, and as above, bouncing
through the hypervisor to communicate information within the guest is asinine,
especially for CoCo VMs.
> overlapping with architecture that expects to cause memory behavior. EDKII and
> the kernel could be changed to use them.