Re: kvm crash on 5.7-rc1 and later

From: Xiaoyao Li
Date: Sun Jul 12 2020 - 02:28:35 EST


On 7/12/2020 2:21 AM, Peter Zijlstra wrote:
On Fri, Jul 03, 2020 at 11:15:31AM -0400, Woody Suwalski wrote:
I am observing a 100% reproducible kvm crash on kernels starting with
5.7-rc1, always with the same opcode 0000.
It happens during wake up from the host suspended state. Worked OK on 5.6
and older.
The host is based on Debian testing, Thinkpad T440, i5 cpu.

[ÂÂ 61.576664] kernel BUG at arch/x86/kvm/x86.c:387!
[ÂÂ 61.576672] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[ÂÂ 61.576678] CPU: 0 PID: 3851 Comm: qemu-system-x86 Not tainted 5.7-pingu
#0
[ÂÂ 61.576680] Hardware name: LENOVO 20B6005JUS/20B6005JUS, BIOS GJETA4WW
(2.54 ) 03/27/2020
[ÂÂ 61.576700] RIP: 0010:kvm_spurious_fault+0xa/0x10 [kvm]

Crash results in a dead kvm and occasionally a very unstable system.

Bisecting the problem between v5.6 and v5.7-rc1 points to

commit 6650cdd9a8ccf00555dbbe743d58541ad8feb6a7
Author: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
Date:ÂÂ Sun Jan 26 12:05:35 2020 -0800

ÂÂÂ x86/split_lock: Enable split lock detection by kernel

Reversing that patch seems to actually "cure" the issue.

The problem is present in all kernels past 5.7-rc1, however the patch is not
reversing directly in later source trees, so can not retest the logic on
recent kernels.

Peter, would you have idea how to debug that (or even better - would you
happen to know the fix)?

I have attached dmesg logs from a "good" 5.6.9 kernel, and then "bad" 5.7.0
and 5.8-rc3

I have no clue about kvm. Nor do I actually have hardware with SLD on.
I've Cc'ed a bunch of folks who might have more ideas.


I think this bug is the same as the one found by Sean, and is already fixed in 5.8-rc4.

https://lore.kernel.org/kvm/20200605192605.7439-1-sean.j.christopherson@xxxxxxxxx/