RE: [REPORT] possible circular locking dependency when booting a VM on arm64 host

From: Salil Mehta
Date: Thu Jul 16 2020 - 04:15:07 EST

Next message: Maxim Levitsky: "kernel oops in 'typec_ucsi' due to commit 'drivers property: When no children in primary, try secondary'"
Previous message: Bean Huo: "Re: [PATCH v6 0/5] scsi: ufs: Add Host Performance Booster Support"
Next in thread: Marc Zyngier: "Re: [REPORT] possible circular locking dependency when booting a VM on arm64 host"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

> From: Salil Mehta
> Sent: Thursday, July 16, 2020 1:53 AM
> To: 'Marc Zyngier' <maz@xxxxxxxxxx>; yuzenghui <yuzenghui@xxxxxxxxxx>
>
> > From: Marc Zyngier [mailto:maz@xxxxxxxxxx]
> > Sent: Wednesday, July 15, 2020 5:09 PM
> > To: yuzenghui <yuzenghui@xxxxxxxxxx>
> >
> > Hi Zenghui,
> >
> > On 2020-07-09 11:41, Zenghui Yu wrote:
> > > Hi All,
> > >
> > > I had seen the following lockdep splat when booting a guest on my
> > > Kunpeng 920 with GICv4 enabled. I can also trigger the same splat
> > > on v5.5 so it should already exist in the kernel for a while. I'm
> > > not sure what the exact problem is and hope someone can have a look!
> >
> > I can't manage to trigger this splat on my D05, despite running guests
> > with GICv4 enabled. A couple of questions below:
>
>
> Sorry I forgot to update but I did try on Friday and I could not manage
> to trigger it on D06/Kunpeng920 either. I used 5.8.0-rc4.
>
>
> > > Thanks,
> > > Zenghui
> > >
> > > [ 103.855511] ======================================================
> > > [ 103.861664] WARNING: possible circular locking dependency detected
> > > [ 103.867817] 5.8.0-rc4+ #35 Tainted: G W
> > > [ 103.872932] ------------------------------------------------------
> > > [ 103.879083] CPU 2/KVM/20515 is trying to acquire lock:
> > > [ 103.884200] ffff202fcd5865b0 (&irq_desc_lock_class){-.-.}-{2:2},
> > > at: __irq_get_desc_lock+0x60/0xa0
> > > [ 103.893127]
> > > but task is already holding lock:
> > > [ 103.898933] ffff202fcfd07f58 (&rq->lock){-.-.}-{2:2}, at:
> > > __schedule+0x114/0x8b8
> > > [ 103.906301]
> > > which lock already depends on the new lock.
> > >
> > > [ 103.914441]
> > > the existing dependency chain (in reverse order) is:
> > > [ 103.921888]
> > > -> #3 (&rq->lock){-.-.}-{2:2}:
> > > [ 103.927438] _raw_spin_lock+0x54/0x70
> > > [ 103.931605] task_fork_fair+0x48/0x150
> > > [ 103.935860] sched_fork+0x100/0x268
> > > [ 103.939856] copy_process+0x628/0x1868
> > > [ 103.944106] _do_fork+0x74/0x710
> > > [ 103.947840] kernel_thread+0x78/0xa0
> > > [ 103.951917] rest_init+0x30/0x270
> > > [ 103.955742] arch_call_rest_init+0x14/0x1c
> > > [ 103.960339] start_kernel+0x534/0x568
> > > [ 103.964503]
> > > -> #2 (&p->pi_lock){-.-.}-{2:2}:
> > > [ 103.970224] _raw_spin_lock_irqsave+0x70/0x98
> > > [ 103.975080] try_to_wake_up+0x5c/0x5b0
> > > [ 103.979330] wake_up_process+0x28/0x38
> > > [ 103.983581] create_worker+0x128/0x1b8
> > > [ 103.987834] workqueue_init+0x308/0x3bc
> > > [ 103.992172] kernel_init_freeable+0x180/0x33c
> > > [ 103.997027] kernel_init+0x18/0x118
> > > [ 104.001020] ret_from_fork+0x10/0x18
> > > [ 104.005097]
> > > -> #1 (&pool->lock){-.-.}-{2:2}:
> > > [ 104.010817] _raw_spin_lock+0x54/0x70
> > > [ 104.014983] __queue_work+0x120/0x6e8
> > > [ 104.019146] queue_work_on+0xa0/0xd8
> > > [ 104.023225] irq_set_affinity_locked+0xa8/0x178
> > > [ 104.028253] __irq_set_affinity+0x5c/0x90
> > > [ 104.032762] irq_set_affinity_hint+0x74/0xb0
> > > [ 104.037540] hns3_nic_init_irq+0xe0/0x210 [hns3]
> > > [ 104.042655] hns3_client_init+0x2d8/0x4e0 [hns3]
> > > [ 104.047779] hclge_init_client_instance+0xf0/0x3a8 [hclge]
> > > [ 104.053760] hnae3_init_client_instance.part.3+0x30/0x68
> > > [hnae3]
> > > [ 104.060257] hnae3_register_ae_dev+0x100/0x1f0 [hnae3]
> > > [ 104.065892] hns3_probe+0x60/0xa8 [hns3]
> >
> > Are you performing some kind of PCIe hot-plug here? Or is that done
> > at boot only? It seems to help triggering the splat.
>
>
> I am not sure how you can do that since HNS3 is integrated NIC so
> physical hot-plug is definitely ruled out. local_pci_probe()
> should also get called when we insert the hns3_enet module which
> eventually initializes the driver.

Or perhaps you meant below?

echo 1 > /sys/bus/pci/devices/xxxx/xx.x/remove
echo 1 > /sys/bus/pci/devices/rescan

Above is not being used I did confirm this with Zenghui earlier.

Next message: Maxim Levitsky: "kernel oops in 'typec_ucsi' due to commit 'drivers property: When no children in primary, try secondary'"
Previous message: Bean Huo: "Re: [PATCH v6 0/5] scsi: ufs: Add Host Performance Booster Support"
Next in thread: Marc Zyngier: "Re: [REPORT] possible circular locking dependency when booting a VM on arm64 host"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]