Re: [PATCH 0/2] x86/kvm: Force legacy PCI hole as WB under SNP/TDX
From: Edgecombe, Rick P
Date: Mon Feb 03 2025 - 13:15:11 EST
On Fri, 2025-01-31 at 16:50 -0800, Sean Christopherson wrote:
> Attempt to hack around the SNP/TDX guest MTRR disaster by hijacking
> x86_platform.is_untracked_pat_range() to force the legacy PCI hole, i.e.
> memory from TOLUD => 4GiB, as unconditionally writeback.
>
> TDX in particular has created an impossible situation with MTRRs. Because
> TDX disallows toggling CR0.CD, TDX enabling decided the easiest solution
> was to ignore MTRRs entirely (because omitting CR0.CD write is obviously
> too simple).
>
> Unfortunately, under KVM at least, the kernel subtly relies on MTRRs to
> make ACPI play nice with device drivers. ACPI tries to map ranges it finds
> as WB, which in turn prevents device drivers from mapping device memory as
> WC/UC-.
>
> For the record, I hate this hack. But it's the safest approach I can come
> up with. E.g. forcing ioremap() to always use WB scares me because it's
> possible, however unlikely, that the kernel could try to map non-emulated
> memory (that is presented as MMIO to the guest) as WC/UC-, and silently
> forcing those mappings to WB could do weird things.
>
> My initial thought was to effectively revert the offending commit and
> skip the cache disabling/enabling, i.e. the problematic CR0.CD toggling,
> but unfortunately OVMF/EDKII has also added code to skip MTRR setup. :-(
Oof. The missing context in 8e690b817e38 ("x86/kvm: Override default caching
mode for SEV-SNP and TDX"), is that since it is impossible to virtualize MTRRs
on TDX private memory (in the old way KVM used to do it) and there was no
upstream support yet, there looked like an opportunity to avoid strange "happens
to work" support that normal VMs ended up with. Instead KVM could just not
support TDX KVM MTRRs from the beginning. So part of the thinking was that we
could drop all TDX KVM MTRR hacks. (which I guess turned out to be wrong).
Since there is no upstream KVM TDX support yet, why isn't it an option to still
revert the EDKII commit too? It was a relatively recent change.
To me it seems that the normal KVM MTRR support is not ideal, because it is
still lying about what it is doing. For example, in the past there was an
attempt to use UC to prevent speculative execution accesses to sensitive data.
The KVM MTRR support only happens to work with existing guests, but not all
possible MTRR usages.
Since diverging from the architecture creates loose ends like that, we could
instead define some other way for EDKII to communicate the ranges to the kernel.
Like some simple KVM PV MSRs that are for communication only, and not
overlapping with architecture that expects to cause memory behavior. EDKII and
the kernel could be changed to use them.