Re: [PATCH 5/5] KVM: VMX: Always honor guest PAT on CPUs that support self-snoop

From: Paolo Bonzini
Date: Mon Sep 09 2024 - 09:25:37 EST


On 9/9/24 07:30, Yan Zhao wrote:
On Thu, Sep 05, 2024 at 05:43:17PM +0800, Yan Zhao wrote:
On Wed, Sep 04, 2024 at 05:41:06PM -0700, Sean Christopherson wrote:
On Wed, Sep 04, 2024, Yan Zhao wrote:
On Wed, Sep 04, 2024 at 10:28:02AM +0800, Yan Zhao wrote:
On Tue, Sep 03, 2024 at 06:20:27PM +0200, Vitaly Kuznetsov wrote:
Sean Christopherson <seanjc@xxxxxxxxxx> writes:

On Mon, Sep 02, 2024, Vitaly Kuznetsov wrote:
FWIW, I use QEMU-9.0 from the same C10S (qemu-kvm-9.0.0-7.el10.x86_64)
but I don't think it matters in this case. My CPU is "Intel(R) Xeon(R)
Silver 4410Y".

Has this been reproduced on any other hardware besides SPR? I.e. did we stumble
on another hardware issue?

Very possible, as according to Yan Zhao this doesn't reproduce on at
least "Coffee Lake-S". Let me try to grab some random hardware around
and I'll be back with my observations.

Update some new findings from my side:

BAR 0 of bochs VGA (fb_map) is used for frame buffer, covering phys range
from 0xfd000000 to 0xfe000000.

On "Sapphire Rapids XCC":

1. If KVM forces this fb_map range to be WC+IPAT, installer/gdm can launch
correctly.
i.e.
if (gfn >= 0xfd000 && gfn < 0xfe000) {
return (MTRR_TYPE_WRCOMB << VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT;
}
return MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT;

2. If KVM forces this fb_map range to be UC+IPAT, installer failes to show / gdm
restarts endlessly. (though on Coffee Lake-S, installer/gdm can launch
correctly in this case).

3. On starting GDM, ttm_kmap_iter_linear_io_init() in guest is called to set
this fb_map range as WC, with
iosys_map_set_vaddr_iomem(&iter_io->dmap, ioremap_wc(mem->bus.offset, mem->size));

However, during bochs_pci_probe()-->bochs_load()-->bochs_hw_init(), pfns for
this fb_map has been reserved as uc- by ioremap().
Then, the ioremap_wc() during starting GDM will only map guest PAT with UC-.

So, with KVM setting WB (no IPAT) to this fb_map range, the effective
memory type is UC- and installer/gdm restarts endlessly.

4. If KVM sets WB (no IPAT) to this fb_map range, and changes guest bochs driver
to call ioremap_wc() instead in bochs_hw_init(), gdm can launch correctly.
(didn't verify the installer's case as I can't update the driver in that case).

The reason is that the ioremap_wc() called during starting GDM will no longer
meet conflict and can map guest PAT as WC.

Huh. The upside of this is that it sounds like there's nothing broken with WC
or self-snoop.
Considering a different perspective, the fb_map range is used as frame buffer
(vram), with the guest writing to this range and the host reading from it.
If the issue were related to self-snooping, we would expect the VNC window to
display distorted data. However, the observed behavior is that the GDM window
shows up correctly for a sec and restarts over and over.

So, do you think we can simply fix this issue by calling ioremap_wc() for the
frame buffer/vram range in bochs driver, as is commonly done in other gpu
drivers?

--- a/drivers/gpu/drm/tiny/bochs.c
+++ b/drivers/gpu/drm/tiny/bochs.c
@@ -261,7 +261,9 @@ static int bochs_hw_init(struct drm_device *dev)
if (pci_request_region(pdev, 0, "bochs-drm") != 0)
DRM_WARN("Cannot request framebuffer, boot fb still active?\n");

- bochs->fb_map = ioremap(addr, size);
+ bochs->fb_map = ioremap_wc(addr, size);
if (bochs->fb_map == NULL) {
DRM_ERROR("Cannot map framebuffer\n");
return -ENOMEM;

While this is a fix for future kernels, it doesn't change the result for VMs already in existence.

I don't think there's an alternative to putting this behind a quirk.

Paolo