Re: [linuxwifi] x86/thermal: AB-BA dependency between mvm->mutex and tz->lock

From: Coelho, Luciano
Date: Thu Aug 03 2017 - 07:31:01 EST


On Thu, 2017-08-03 at 13:02 +0300, Kalle Valo wrote:
> "Coelho, Luciano" <luciano.coelho@xxxxxxxxx> writes:
>
> > On Thu, 2017-08-03 at 11:10 +0200, Jiri Kosina wrote:
> > > On Mon, 31 Jul 2017, Jiri Kosina wrote:
> > >
> > > > Hi,
> > > >
> > > > booting current Linus' tree, I'm seeing lockdep splat (see the end of this
> > > > mail).
> > > >
> > > > Apparently, there is AB-BA between tz->lock and mvm->mutex through the CPU
> > > > hotplug lock.
> > > >
> > > > The obivous depency is: thermal_zone_get_temp() acquires tz->lock, and
> > > > then calls iwl_mvm_tzone_get_temp() (through tz->ops->get_temp()
> > > > callback), which acquires mvm->mutex
> > > >
> > > > The less obvious dependency is primarily caused by iwl_op_mode_mvm_start()
> > > > allocating workqueue (#2 stacktrace) while holding mvm->mutex (which is
> > > > broken, because that mutex is being taken also from CPU hotplug callback
> > > > path, hence the AB-BA).
> > >
> > > As the "central" part of the dependency is being added by iwlwifi driver
> > > (_iwl_pcie_rx_init() allocating workqueue while holding
> > > trans_pcie->mutex), I'm adding iwlwifi folks as well to CC.

[...]

> > > > -> #2 (cpu_hotplug_lock.rw_sem){++++++}:
> > > > lock_acquire+0xbd/0x220
> > > > cpus_read_lock+0x46/0x90
> > > > apply_workqueue_attrs+0x17/0x50
> > > > __alloc_workqueue_key+0x195/0x4d0
> > > > _iwl_pcie_rx_init+0x384/0x390 [iwlwifi]
> > > > iwl_pcie_rx_init+0x1e/0x380 [iwlwifi]
> > > > iwl_trans_pcie_start_fw+0x295/0x6f0 [iwlwifi]
> > > > iwl_mvm_load_ucode_wait_alive+0xe7/0x390 [iwlmvm]
> > > > iwl_run_init_mvm_ucode+0x84/0x320 [iwlmvm]
> > > > iwl_op_mode_mvm_start+0x964/0xd30 [iwlmvm]
> > > > _iwl_op_mode_start.isra.9+0x47/0xa0 [iwlwifi]
> > > > iwl_opmode_register+0xaa/0xd0 [iwlwifi]
> > > > iwl_mvm_init+0x37/0x1000 [iwlmvm]
> > > > do_one_initcall+0x51/0x1a9
> > > > do_init_module+0x60/0x20e
> > > > load_module+0x203f/0x2b50
> > > > SYSC_finit_module+0x96/0xd0
> > > > SyS_finit_module+0xe/0x10
> > > > entry_SYSCALL_64_fastpath+0x23/0xc2

Okay, so as I understand it the problem has been there for a long time,
but the splat is only coming up now because of Thomas' patch that adds
the lockdep map[1], right?

I see the workqueue allocation you mentioned. I'll try to move this
allocation out of the mutex and see how it goes.

[1] http://lkml.kernel.org/r/20170524081549.709375845@xxxxxxxxxxxxx

--
Cheers,
Luca.