Re: [REGRESSION] GPU passes into VM improperly after c376a3456d8b or a98db518dde2

From: 70sp

Date: Tue Apr 14 2026 - 05:23:35 EST


I can confirm, that the "domain is not compatible with device" message is nowhere to be seen.

I have double checked by also adding an else statement with a different message and that one showed up several times. (by pci (iGPU) 0000:00:02.0, pcieport 0000:00:01.0 and vfio-pci (GTX 970) 0000:01:00.0, 0000:01:00.1). ret = 0.



Sent with Proton Mail secure email.

On Monday, April 13th, 2026 at 8:49 AM, Baolu Lu <baolu.lu@xxxxxxxxxxxxxxx> wrote:

> On 4/12/26 19:17, 70sp wrote:
> > Hello,
> >
> > I have been dealing with a regression launching a Windows QEMU/KVM
> > virtual machine with a GPU passed through.
> >
> > The issue consists of launching a QEMU/KVM VM, which gets stuck for
> > about 2 minutes on booting with a white screen and then having NVIDIA’s
> > code 43 in Windows.
> >
> > I’m certain, that the issue is not caused by anything in Windows or
> > related software in Linux, because I tried reinstalling my whole PC
> > including the Windows VM. I tried to reproduce the bug on an out-of-the-
> > box Arch Linux install and the bug is still present.
> >
> > The first bad commit is either a98db518dde246e01ead53617dc0a30d6aaa3752
> > or c376a3456d8bef43ec556a98c0a04c35086c2737. I don’t know for sure which
> > one introduced it, because during bisection I had to skip
> > a98db518dde246e01ead53617dc0a30d6aaa3752 due to it being unable to
> > launch the virtual machine resulting in a different error (didn’t even
> > start booting). In kernels before these commits, the VM works flawlessly.
> >
> > I have tested it on latest mainline kernel and the issue is still
> > present. I have been experiencing the issue since kernel 6.13, so I just
> > switched to the 6.12 LTS kernel instead which doesn’t have this issue.
> >
> > Configuration of my Linux install and hardware: https://pastebin.com/
> > rcsyyYiK
> > .config: https://pastebin.com/RTQCBduD
> > dmesg errors: https://pastebin.com/84jPP81E
> > lspci: https://pastebin.com/qi29BSWi
> >
> > #regzbot introduced:
> > a98db518dde246e01ead53617dc0a30d6aaa3752..c376a3456d8bef43ec556a98c0a04c35086c2737
>
> Before these commits, if a device was attached to a domain that didn't
> perfectly match the hardware's capabilities (such as address width or
> coherency), the kernel would dynamically adjust the domain to
> accommodate the hardware.
>
> Following these two commits, the driver now applies a "match or fail"
> policy. If the domain is incompatible with the device's hardware
> capabilities, it returns -EINVAL. This expects the caller to allocate a
> new domain dedicated to that specific device and attempt the attachment
> again.
>
> Can you please add a message line in paging_domain_compatible() to
> verify whether it's a domain compatibility issue?
>
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index 205debd76989..c7e1e0dfa250 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -3111,8 +3111,10 @@ int paging_domain_compatible(struct iommu_domain
> *domain, struct device *dev)
> ret =
> paging_domain_compatible_second_stage(dmar_domain, iommu);
> else if (WARN_ON(true))
> ret = -EINVAL;
> - if (ret)
> + if (ret) {
> + dev_info(dev, "domain is not compatible with device, ret
> = %d", ret);
> return ret;
> + }
>
> if (sm_supported(iommu) && !dev_is_real_dma_subdevice(dev) &&
> context_copied(iommu, info->bus, info->devfn))
>
> Thanks,
> baolu
>