Re: [REGRESSION][BISECTED] Some applications under Windows 11 libvirt VM crash since commit 408eb7417a92c5354c7be34f7425b305dfe30ad9

From: Dave Hansen
Date: Wed Mar 05 2025 - 18:42:28 EST


On 3/5/25 15:30, Miguel Ruiz wrote:
> Hello everyone,
>
> I have encountered a regression in the kernel where certain applications
> under Windows 11 in libvirt completely crash. I've reported this on the
> arch linux forums <https://bbs.archlinux.org/viewtopic.php?pid=2229827>
> and we've narrowed down the cause as this diff <https://lore.kernel.org/
> all/20240808062937.1149-3-ravi.bangoria@xxxxxxx/> enabling bus lock
> support. Huge thanks to Christian Heusel <mailto:christian@xxxxxxxxx>
> who went through the painstaking process to bisect and build various
> kernel images for me to test.
>
> I'd like to list my desktop PC specifications for some context:
>
> * Motherboard - Asrock B650m Pro RS
> * Processor - AMD Ryzen 7 9800x3d
> * Graphics Card - Nvidia Geforce RTX 3060 12GB
> * Memory - Silicon Power Zenith 64GB DDR5-6000 CL30
> * Storage - 
> o Western Digital SN580 1TB nvme SSD (Arch Linux is here)
> o Crucial MX300 750GB sata SSD
> o Seagate BarraCuda ST8000DM004 8TB sata HD
>
> My Windows 11 qcow image is on the nvme and I'm passing through the
> other 2 sata drives. I've pinned and isolated 7 cores from the host to
> use on the VM. My RTX 3060 is also passed through into the VM (thus
> isolated from being used on the host via the vfio-pci.ids flag). I share
> the mice & keyboard via evdev.
>
> The issue does not manifest itself across all applications. E.G. Firefox
> & the Epic Games client are unaffected, but the Steam client immediately
> exits (crashes?) as soon as you attempt to download, update, or
> uninstall a game. Certain games such as Yooka Laylee also refuse to
> launch properly & exit immediately as well. On kernel versions prior to
> this, no such issue occurs and all applications work normally as
> expected. Furthermore, when booting the host system with the kernel
> argument: "split_lock_detect=off", on modern kernel versions, the issue
> goes away providing more evidence that the commit mentioned above is the
> cause. This issue is also consistent across a standard Windows 10 VM
> with no GPU passthrough or other special customization.
>
> Please let me know if there is more information I can provide. I'll be
> happy to help with any logs/reports or any other debug info and test any
> potential fixes. If the fix is found, please make sure to credit
> Christian Heusel <mailto:christian@xxxxxxxxx> like so!
>
> Bisected-by: Christian Heusel <christian@xxxxxxxxx
> <mailto:christian@xxxxxxxxx>>

I'm leaving all the context in because the HTML from the original will
make sure it won't make it to the lists.

I'm also cc'ing the KVM folks. I don't actually know how the #GP from
the split lock detection will manifest to KVM guests. I'm kinda
surprised it crashes the guest app and doesn't do anything more noisy.

Is there anything in the Linux logs (dmesg) of interest?

We've had a few other reports of split lock detection triggering in
Linux applications. But, we've either just tuned down the aggressiveness
of the detection or told people to turn it off. We haven't treated split
lock detection activation as a regression up to this point, just a
feature working as intended.