Re: BUG: Invalid wait context with 5.7-rc2?

From: John Stultz
Date: Thu Apr 23 2020 - 13:58:47 EST


On Thu, Apr 23, 2020 at 10:29 AM John Stultz <john.stultz@xxxxxxxxxx> wrote:
>
> On Thu, Apr 23, 2020 at 10:19 AM Valentin Schneider
> <valentin.schneider@xxxxxxx> wrote:
> > On 23/04/20 17:40, John Stultz wrote:
> > > Hey Folks,
> > >
> > > Recently, I've seen some occasional hangs earlyish in boot on my
> > > HiKey960 board with 5.7-rc1/rc2. The kernel isn't totally wedged as I
> > > will see some kernel messages (firmware loading failures, etc) much
> > > later if I leave it. But oddly sysrq doesn't respond.
> > >
> > > Figuring it must be some sort of deadlock, I added LOCKDEP and a bunch
> > > of other debug options and started booting in a loop. So far I've not
> > > been able to trigger the original problem, but I do see the following
> > > every boot:
> > >
> >
> > Interestingly I can't seem to reproduce that one with the latest master
> > (5.7.0-rc2-00115-g8c2e9790f196). Is that with some of the extra h960
> > patches?
>
> There are additional patches for hikey960, but nothing from the
> tracelog (which looked pretty generic).
> But I'll pull everything out to confirm and re-check against
> linus/master in case there's a recent fix.

Yep. Still seeing it against linus/master with nothing else:
[ 2.091225] clocksource: Switched to clocksource arch_sys_counter
[ 2.091406]
[ 2.098896] =============================
[ 2.102942] [ BUG: Invalid wait context ]
[ 2.106991] 5.7.0-rc2-00069-g18bf34080c4c #250 Not tainted
[ 2.112528] -----------------------------
[ 2.116574] swapper/1/0 is trying to lock:
[ 2.120710] ffffff82194af258 (&pool->lock){..-.}-{3:3}, at:
__queue_work+0x108/0x7c8
[ 2.128550] other info that might help us debug this:
[ 2.133649] context-{2:2}
[ 2.136290] 1 lock held by swapper/1/0:
[ 2.140159] #0: ffffffc011d049c8 (rcu_read_lock){....}-{1:3}, at:
__queue_work+0x48/0x7c8
[ 2.148513] stack backtrace:
[ 2.151421] CPU: 1 PID: 0 Comm: swapper/1 Not tainted
5.7.0-rc2-00069-g18bf34080c4c #250
[ 2.159593] Hardware name: HiKey960 (DT)
[ 2.163554] Call trace:
[ 2.166023] dump_backtrace+0x0/0x1a8
[ 2.169719] show_stack+0x18/0x28
[ 2.173066] dump_stack+0xdc/0x148
[ 2.176500] __lock_acquire+0x5c0/0x1568
[ 2.180458] lock_acquire+0x100/0x378
[ 2.184159] _raw_spin_lock+0x64/0x108
[ 2.187942] __queue_work+0x108/0x7c8
[ 2.191637] queue_work_on+0xd0/0xf0
[ 2.195248] timers_update_nohz+0x28/0x38
[ 2.199296] tick_setup_sched_timer+0x110/0x178
[ 2.203870] hrtimer_run_queues+0x114/0x170
[ 2.208092] run_local_timers+0x30/0x70
[ 2.211963] update_process_times+0x28/0x58
[ 2.216189] tick_periodic+0x48/0x148
[ 2.219884] tick_handle_periodic+0x28/0xc8
[ 2.224112] arch_timer_handler_phys+0x2c/0x50
[ 2.225533] VFS: Disk quotas dquot_6.6.0
[ 2.228600] handle_percpu_devid_irq+0xe0/0x460
[ 2.228605] generic_handle_irq+0x30/0x48
[ 2.228608] __handle_domain_irq+0x88/0xf8
[ 2.228613] gic_handle_irq+0x5c/0xb0
[ 2.228619] el1_irq+0xf4/0x1c0
[ 2.232654] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[ 2.237146] arch_cpu_idle+0x2c/0x230
[ 2.237151] default_idle_call+0x20/0x44
[ 2.237156] do_idle+0x1ec/0x2d0
[ 2.237159] cpu_startup_entry+0x24/0x48
[ 2.241701] pnp: PnP ACPI: disabled
[ 2.245338] secondary_start_kernel+0x160/0x210