Re: [syzbot] [cgroups?] possible deadlock in __run_timer_base (2)

From: Peter Zijlstra
Date: Fri Feb 14 2025 - 05:20:13 EST


On Tue, Feb 11, 2025 at 09:14:12PM -0500, Waiman Long wrote:
> On 2/9/25 3:43 PM, syzbot wrote:
> > Hello,
> >
> > syzbot found the following issue on:
> >
> > HEAD commit: 92514ef226f5 Merge tag 'for-6.14-rc1-tag' of git://git.ker..
> > git tree: upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=179453df980000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=1909f2f0d8e641ce
> > dashboard link: https://syzkaller.appspot.com/bug?extid=ed801a886dfdbfe7136d
> > compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> >
> > Unfortunately, I don't have any reproducer for this issue yet.
> >
> > Downloadable assets:
> > disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/7feb34a89c2a/non_bootable_disk-92514ef2.raw.xz
> > vmlinux: https://storage.googleapis.com/syzbot-assets/c4d8b91f8769/vmlinux-92514ef2.xz
> > kernel image: https://storage.googleapis.com/syzbot-assets/c24ec4365966/bzImage-92514ef2.xz
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+ed801a886dfdbfe7136d@xxxxxxxxxxxxxxxxxxxxxxxxx
>
> This problem should be fixed by the following upstream patch once it is
> merged into mainline.
>
> https://lore.kernel.org/lkml/20250127013127.3913153-1-longman@xxxxxxxxxx/
>

AFAICT all these lockdep reports are because of an earlier warning. Fix
warning, report goes away. Notably:

> > _printk+0xd5/0x120 kernel/printk/printk.c:2457
> > __report_bug lib/bug.c:195 [inline]
> > report_bug+0x346/0x500 lib/bug.c:219
> > handle_bug+0x60/0x90 arch/x86/kernel/traps.c:285
> > exc_invalid_op+0x1a/0x50 arch/x86/kernel/traps.c:309
> > asm_exc_invalid_op+0x1a/0x20 arch/x86/include/asm/idtentry.h:621
> > expire_timers kernel/time/timer.c:1827 [inline]

IOW I think we're focusing on the wrong thing here.

> Peter, are you planning to merge this patch? This is another instance where
> the old way of calling wake_up_process() inside the lock critical region can
> lead to deadlock.

I still don't love the Changelog, but yeah, I suppose I can pick it up.
But I see Boqun took it and I'll get it eventually.

No real hurry there I suppose.