Re: Could you please help to have a look a bug trace in pmu arm-cci.c

From: Will Deacon
Date: Wed Jan 30 2019 - 13:21:34 EST


[+Suzuki and Robin]

On Mon, Jan 28, 2019 at 07:19:20AM +0000, Li, Meng wrote:
> When enable kernel configure CONFIG_DEBUG_ATOMIC_SLEEP, there is below trace
> during pmu arm cci driver probe phase.
>
> [ 1.983337] BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:2004
> [ 1.983340] in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: swapper/0
> [ 1.983342] Preemption disabled at:
> [ 1.983353] [<ffffff80089801f4>] cci_pmu_probe+0x1dc/0x488
> [ 1.983360] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.18.20-rt8-yocto-preempt-rt #1
> [ 1.983362] Hardware name: ZynqMP ZCU102 Rev1.0 (DT)
> [ 1.983364] Call trace:
> [ 1.983369] dump_backtrace+0x0/0x158
> [ 1.983372] show_stack+0x24/0x30
> [ 1.983378] dump_stack+0x80/0xa4
> [ 1.983383] ___might_sleep+0x138/0x160
> [ 1.983386] __might_sleep+0x58/0x90
> [ 1.983391] __rt_mutex_lock_state+0x30/0xc0
> [ 1.983395] _mutex_lock+0x24/0x30
> [ 1.983400] perf_pmu_register+0x2c/0x388
> [ 1.983404] cci_pmu_probe+0x2bc/0x488
> [ 1.983409] platform_drv_probe+0x58/0xa8
>
> Because get_cpu() is invoked, preempt is disable, finally, trace occurs when
> call might_sleep()

Hmm, the {get,put}_cpu() usage here looks very broken to me. There's the
fact that it might sleep, but also the assignment to g_cci_pmu is done after
we've re-enabled preemption, so there's a race with CPU hotplug there too.

I don't think we can simply register the hotplug notifier before registering
the PMU, because we can't call into perf_pmu_migrate_context() until the PMU
has been registered. Perhaps we need to use the _cpuslocked() versions of
the hotplug notifier registration functions.

I tried looking at some other drivers, but they all look broken to me, so
there's a good chance I'm missing something. Anybody know how this is
supposed to work?

Will