Re: BUG: Invalid wait context with 5.7-rc2?

From: John Stultz
Date: Thu Apr 23 2020 - 13:29:56 EST


On Thu, Apr 23, 2020 at 10:19 AM Valentin Schneider
<valentin.schneider@xxxxxxx> wrote:
> On 23/04/20 17:40, John Stultz wrote:
> > Hey Folks,
> >
> > Recently, I've seen some occasional hangs earlyish in boot on my
> > HiKey960 board with 5.7-rc1/rc2. The kernel isn't totally wedged as I
> > will see some kernel messages (firmware loading failures, etc) much
> > later if I leave it. But oddly sysrq doesn't respond.
> >
> > Figuring it must be some sort of deadlock, I added LOCKDEP and a bunch
> > of other debug options and started booting in a loop. So far I've not
> > been able to trigger the original problem, but I do see the following
> > every boot:
> >
>
> Interestingly I can't seem to reproduce that one with the latest master
> (5.7.0-rc2-00115-g8c2e9790f196). Is that with some of the extra h960
> patches?

There are additional patches for hikey960, but nothing from the
tracelog (which looked pretty generic).
But I'll pull everything out to confirm and re-check against
linus/master in case there's a recent fix.

> I do get this however:
>
> [ 3.626638] INFO: trying to register non-static key.
> [ 3.626639] the code is fine but needs lockdep annotation.
> [ 3.626640] turning off the locking correctness validator.
> [ 3.626644] CPU: 7 PID: 51 Comm: kworker/7:1 Not tainted 5.7.0-rc2-00115-g8c2e9790f196 #116
> [ 3.626646] Hardware name: HiKey960 (DT)
> [ 3.626656] Workqueue: events deferred_probe_work_func
> [ 3.632476] sd 0:0:0:0: [sda] Optimal transfer size 8192 bytes not a multiple of physical block size (16384 bytes)
> [ 3.640220] Call trace:
> [ 3.640225] dump_backtrace+0x0/0x1b8
> [ 3.640227] show_stack+0x20/0x30
> [ 3.640230] dump_stack+0xec/0x158
> [ 3.640234] register_lock_class+0x598/0x5c0
> [ 3.640235] __lock_acquire+0x80/0x16c0
> [ 3.640236] lock_acquire+0xf4/0x4a0
> [ 3.640241] _raw_spin_lock_irqsave+0x70/0xa8
> [ 3.640245] uart_add_one_port+0x388/0x4b8
> [ 3.640248] pl011_register_port+0x70/0xf0
> [ 3.640250] pl011_probe+0x184/0x1b8
> [ 3.640254] amba_probe+0xdc/0x180
> [ 3.640256] really_probe+0xe0/0x338
> [ 3.640257] driver_probe_device+0x60/0xf8
> [ 3.640259] __device_attach_driver+0x8c/0xd0
> [ 3.640260] bus_for_each_drv+0x84/0xd8
> [ 3.640261] __device_attach+0xe4/0x140
> [ 3.640263] device_initial_probe+0x1c/0x28
> [ 3.640265] bus_probe_device+0xa4/0xb0
> [ 3.640266] deferred_probe_work_func+0x7c/0xb8
> [ 3.640269] process_one_work+0x2c0/0x768
> [ 3.640271] worker_thread+0x4c/0x498
> [ 3.640272] kthread+0x14c/0x158
> [ 3.640275] ret_from_fork+0x10/0x1c

Oof. Way to twist the knife :) I'm probably to blame for that
deferred_probe_work_func issue. I'll take a look at it.

thanks
-john