Re: WARNING: CPU: 3 PID: 1 at block/blk-mq-cpumap.c:90 blk_mq_map_hw_queues+0xf3/0x100

From: Steven Rostedt
Date: Wed Jan 22 2025 - 12:54:47 EST


On Wed, 22 Jan 2025 18:08:35 +0100
Daniel Wagner <dwagner@xxxxxxx> wrote:

> fallback:
> WARN_ON_ONCE(qmap->nr_queues > 1);
> blk_mq_clear_mq_map(...)
> }

I commented out the WARN_ON_ONCE() to see if I could finish my testing,
but it now triggered this, but much later on in the tests:

[ 813.092038] BUG: kernel NULL pointer dereference, address: 0000000000000090
[ 813.094136] #PF: supervisor read access in kernel mode
[ 813.095643] #PF: error_code(0x0000) - not-present page
[ 813.095643] PGD 0 P4D 0
[ 813.095643] Oops: Oops: 0000 [#1] PREEMPT SMP PTI
[ 813.095643] CPU: 1 UID: 0 PID: 22 Comm: cpuhp/1 Not tainted 6.13.0-test-01253-g66611c047570-dirty #27
[ 813.095643] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[ 813.095643] RIP: 0010:blk_mq_all_tag_iter+0x1a/0x270
[ 813.095643] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f 00 0f 1f 44 00 00 41 57 41 56 41 55 49 89 fd 41 54 55 53 48 83 ec 50 <48> 8b 87 90 00 00 00 65 4c 8b 04 25 28 00 00 00 4c 89 44 24 48 49
[ 813.095643] RSP: 0018:ffffb668400d3da0 EFLAGS: 00010286
[ 813.095643] RAX: 0000000000000000 RBX: ffffa1f0408e7200 RCX: 0000000000000000
[ 813.095643] RDX: ffffb668400d3e28 RSI: ffffffffa1511d30 RDI: 0000000000000000
[ 813.095643] RBP: ffffb668400d3e28 R08: ffffa1f0bbc9c528 R09: 0000000000000001
[ 813.095643] R10: ffffa1f0408ec600 R11: 0000000000000001 R12: 0000000000000003
[ 813.095643] R13: 0000000000000000 R14: 0000000000000000 R15: ffffa1f0bbc9c528
[ 813.095643] FS: 0000000000000000(0000) GS:ffffa1f0bbc80000(0000) knlGS:0000000000000000
[ 813.095643] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 813.095643] CR2: 0000000000000090 CR3: 0000000104d42004 CR4: 0000000000170ef0
[ 813.095643] Call Trace:
[ 813.095643] <TASK>
[ 813.095643] ? __die+0x56/0x97
[ 813.095643] ? page_fault_oops+0xbe/0x250
[ 813.095643] ? search_extable+0x26/0x30
[ 813.095643] ? blk_mq_all_tag_iter+0x1a/0x270
[ 813.095643] ? search_module_extables+0x19/0x60
[ 813.095643] ? exc_page_fault+0x227/0x6d0
[ 813.095643] ? affine_move_task+0x26f/0x510
[ 813.095643] ? asm_exc_page_fault+0x26/0x30
[ 813.095643] ? __pfx_blk_mq_has_request+0x10/0x10
[ 813.095643] ? blk_mq_all_tag_iter+0x1a/0x270
[ 813.095643] ? xas_load+0xd/0xd0
[ 813.095643] ? xa_load+0x7b/0xb0
[ 813.095643] blk_mq_hctx_notify_offline+0xf1/0x1a0
[ 813.095643] ? __pfx_blk_mq_hctx_notify_offline+0x10/0x10
[ 813.095643] cpuhp_invoke_callback+0x214/0x420
[ 813.095643] ? __pfx_smpboot_thread_fn+0x10/0x10
[ 813.095643] cpuhp_thread_fun+0x98/0x150
[ 813.095643] smpboot_thread_fn+0xdd/0x1d0
[ 813.095643] kthread+0xd2/0x100
[ 813.095643] ? __pfx_kthread+0x10/0x10
[ 813.095643] ret_from_fork+0x34/0x50
[ 813.095643] ? __pfx_kthread+0x10/0x10
[ 813.095643] ret_from_fork_asm+0x1a/0x30
[ 813.095643] </TASK>
[ 813.095643] Modules linked in:
[ 813.095643] CR2: 0000000000000090
[ 813.095643] ---[ end trace 0000000000000000 ]---

It triggered after doing the mmiotrace which shuts down and brings up CPUs.

Not sure its related. I can see how reproducible this is, and if it is, I
can try to bisect it.

-- Steve