Re: Circular lockdep in kvm_reset_vcpu() ?

From: Marc Zyngier
Date: Mon Mar 13 2023 - 10:40:54 EST


On 2023-03-13 10:09, Cristian Marussi wrote:
On Sat, Feb 11, 2023 at 12:56:41AM +0000, Oliver Upton wrote:
Hi Jeremy,


Hi,

On Fri, Feb 10, 2023 at 11:46:36AM -0600, Jeremy Linton wrote:
> Hi,
>
> I saw this pop yesterday:

You and me both actually! Shame on me, I spoke off-list about this with
Marc in passing. Thanks for sending along the report.

> [ 78.333360] ======================================================
> [ 78.339541] WARNING: possible circular locking dependency detected
> [ 78.345721] 6.2.0-rc7+ #19 Not tainted
> [ 78.349470] ------------------------------------------------------
> [ 78.355647] qemu-system-aar/859 is trying to acquire lock:
> [ 78.361130] ffff5aa69269eba0 (&host_kvm->lock){+.+.}-{3:3}, at:
> kvm_reset_vcpu+0x34/0x274
> [ 78.369344]
> [ 78.369344] but task is already holding lock:
> [ 78.375182] ffff5aa68768c0b8 (&vcpu->mutex){+.+.}-{3:3}, at:
> kvm_vcpu_ioctl+0x8c/0xba0

[...]

> It appears to be triggered by the new commit 42a90008f890a ('KVM: Ensure
> lockdep knows about kvm->lock vs. vcpu->mutex ordering rule') which is
> detecting the vcpu lock grabbed by kvm_vcpu_ioctl() and then the kvm mutext
> grabbed by kvm_reset_vcpu().

Right, this commit gave lockdep what it needed to smack us on the head
for getting the locking wrong in the arm64 side.

As gross as it might be, the right direction is likely to have our own
lock in kvm_arch that we can acquire while holding the vcpu mutex. I'll
throw a patch at the list once I get done testing it.


I just hit this using a v6.3-rc2 and a mainline kvmtool.

In my case, though, the guest does not even boot if I use more than 1
vcpu, which
I suppose triggers effectively the reported possible deadlock, i.e.:

root/lkvm_master run -c 4 -m 4096 -k /root/Image_guest -d
/root/disk_debian_buster_guest.img -p "loglevel=8"
# lkvm run -k /root/Image_guest -m 4096 -c 4 --name guest-288
....<HANGS FOREVER>

Pass earlycon to the guest for a start.

I seriously doubt someone has actually seen a deadlock, because
the issue has been there for at least the past 7 years...

And -rc2 works just fine here.

M.
--
Who you jivin' with that Cosmik Debris?