Re: [syzbot] [crypto?] possible deadlock in alloc_workqueue

From: Daniel Jordan
Date: Mon Jul 15 2024 - 10:58:21 EST


On Sat, Jul 13, 2024 at 07:46:20AM GMT, syzbot wrote:
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: 82d01fe6ee52 Add linux-next specific files for 20240709
> git tree: linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=15ecf3b9980000
> kernel config: https://syzkaller.appspot.com/x/.config?x=95a20e7acf357998
> dashboard link: https://syzkaller.appspot.com/bug?extid=2009b142f47c1e8fe762
...
> ============================================
> WARNING: possible recursive locking detected
> 6.10.0-rc7-next-20240709-syzkaller #0 Not tainted
> --------------------------------------------
> swapper/0/1 is trying to acquire lock:
> ffffffff8e1d19f0 (cpu_hotplug_lock){++++}-{0:0}, at: apply_wqattrs_lock kernel/workqueue.c:5134 [inline]
> ffffffff8e1d19f0 (cpu_hotplug_lock){++++}-{0:0}, at: alloc_workqueue+0xb99/0x1ff0 kernel/workqueue.c:5719
>
> but task is already holding lock:
> ffffffff8e1d19f0 (cpu_hotplug_lock){++++}-{0:0}, at: padata_alloc+0xaa/0x370 kernel/padata.c:1005
...
> stack backtrace:
> CPU: 1 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.10.0-rc7-next-20240709-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/07/2024
> Call Trace:
> <TASK>
> __dump_stack lib/dump_stack.c:94 [inline]
> dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
> print_deadlock_bug+0x483/0x620 kernel/locking/lockdep.c:3034
> check_deadlock kernel/locking/lockdep.c:3086 [inline]
> validate_chain+0x15e2/0x5920 kernel/locking/lockdep.c:3888
> __lock_acquire+0x1359/0x2000 kernel/locking/lockdep.c:5193
> lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5816
> percpu_down_read include/linux/percpu-rwsem.h:51 [inline]
> cpus_read_lock+0x42/0x150 kernel/cpu.c:490
> apply_wqattrs_lock kernel/workqueue.c:5134 [inline]
> alloc_workqueue+0xb99/0x1ff0 kernel/workqueue.c:5719
> padata_alloc+0xc3/0x370 kernel/padata.c:1007
> pcrypt_init_padata+0x27/0x100 crypto/pcrypt.c:327
> pcrypt_init+0x65/0xe0 crypto/pcrypt.c:352

This isn't an issue anymore.

A workqueue change in the above next tree did cause this deadlock, but
it was reported at

https://lore.kernel.org/all/CAJhGHyC=5FC1uFt0xzMwk42m=zm-_d9-OxoC4BQmSREAbAQrog@xxxxxxxxxxxxxx/

and workqueue used different locking to avoid it.