Re: [BUG] workqueues and printk not playing nice since next-20240130

From: Paul E. McKenney
Date: Mon Feb 05 2024 - 16:22:11 EST


On Mon, Feb 05, 2024 at 07:46:48AM -1000, Tejun Heo wrote:
> On Mon, Feb 05, 2024 at 09:45:53AM -0800, Paul E. McKenney wrote:
> > On Mon, Feb 05, 2024 at 10:25:15PM +0900, Sergey Senozhatsky wrote:
> > > On (24/02/05 14:07), Petr Mladek wrote:
> > > > > Good point, if it does recur, I could try it on bare metal.
> > > >
> > > > Please, me, John, and Sergey know if anyone see this again. I do not
> > > > feel comfortable when there is problem which might make consoles calm.
> > >
> > > Agreed.
> > >
> > > > Bisection identified this commit:
> > > > 5797b1c18919 ("workqueue: Implement system-wide nr_active enforcement for unbound workqueues")
> > >
> > > That commit triggered early boot use-after-free (per kasan) on
> > > my system, which probably could derail some things.
> >
> > And enabling KASAN on next-20240130 got me that same KASAN report and
> > also suppressed the misbehavior, which is not surprising given that
> > KASAN quarantines free memory for some time. Plus enabling KASAN
> > on recent -next does not trigger that KASAN report.
> >
> > So my guess is that we can attribute my oddball test failures to
> > that use after free. But I will of course continue testing.
>
> Can someone paste the KASAN report?

Here you go!

Thanx, Paul

------------------------------------------------------------------------

[ 0.316453] ==================================================================
[ 0.317646] BUG: KASAN: use-after-free in wq_update_node_max_active+0x123/0x810
[ 0.318851] Read of size 8 at addr ffff88802109d788 by task swapper/0/0
[ 0.319937]
[ 0.320195] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.8.0-rc2-next-20240130 #7935
[ 0.321453] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[ 0.323299] Call Trace:
[ 0.323700] <TASK>
[ 0.324043] dump_stack_lvl+0x37/0x50
[ 0.324653] print_report+0xcb/0x620
[ 0.325249] ? wq_update_node_max_active+0x123/0x810
[ 0.326066] kasan_report+0xaf/0xe0
[ 0.326639] ? wq_update_node_max_active+0x123/0x810
[ 0.327455] kasan_check_range+0x39/0x1c0
[ 0.328119] wq_update_node_max_active+0x123/0x810
[ 0.328903] ? __pfx_mutex_lock+0x10/0x10
[ 0.329567] apply_wqattrs_commit+0x4e4/0xb80
[ 0.330289] ? __pfx_mutex_lock+0x10/0x10
[ 0.330946] apply_workqueue_attrs_locked+0x9e/0x110
[ 0.331764] alloc_workqueue+0xf76/0x18d0
[ 0.332432] ? __pfx_alloc_workqueue+0x10/0x10
[ 0.333189] ? kasan_unpoison+0x27/0x60
[ 0.333818] ? kasan_unpoison+0x27/0x60
[ 0.334455] ? __kasan_slab_alloc+0x30/0x70
[ 0.335147] ? __pfx_mutex_unlock+0x10/0x10
[ 0.335831] ? idr_alloc_u32+0x291/0x2c0
[ 0.336479] ? mutex_unlock+0x7e/0xd0
[ 0.337085] workqueue_init_early+0x69a/0xe70
[ 0.337800] ? __pfx_workqueue_init_early+0x10/0x10
[ 0.338605] ? kmem_cache_create_usercopy+0xcc/0x230
[ 0.339421] start_kernel+0x141/0x380
[ 0.340023] x86_64_start_reservations+0x18/0x30
[ 0.340788] x86_64_start_kernel+0xcf/0xe0
[ 0.341465] secondary_startup_64_no_verify+0x16d/0x17b
[ 0.342334] </TASK>
[ 0.342703]
[ 0.342954] The buggy address belongs to the physical page:
[ 0.343899] page:00000000a19a7ad3 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x2109d
[ 0.345471] flags: 0x100000000000000(node=0|zone=1)
[ 0.346297] page_type: 0xffffffff()
[ 0.346882] raw: 0100000000000000 ffffea0000842748 ffffea0000842748 0000000000000000
[ 0.348184] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[ 0.349518] page dumped because: kasan: bad access detected
[ 0.350457]
[ 0.350706] Memory state around the buggy address:
[ 0.351532] ffff88802109d680: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 0.352748] ffff88802109d700: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 0.353968] >ffff88802109d780: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 0.355221] ^
[ 0.355808] ffff88802109d800: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 0.357161] ffff88802109d880: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 0.358439] ==================================================================