Re: WARNING: CPU: 3 PID: 1 at block/blk-mq-cpumap.c:90 blk_mq_map_hw_queues+0xf3/0x100

From: Daniel Wagner
Date: Thu Jan 23 2025 - 07:55:20 EST


On Thu, Jan 23, 2025 at 08:59:57AM +0100, Daniel Wagner wrote:
> On Wed, Jan 22, 2025 at 05:58:17PM -0500, Steven Rostedt wrote:
> > On Wed, 22 Jan 2025 12:54:45 -0500
> > Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
> >
> > > Not sure its related. I can see how reproducible this is, and if it is, I
> > > can try to bisect it.
> >
> > I bisected it down to: a5665c3d150c98 ("virtio: blk/scsi: replace
> > blk_mq_virtio_map_queues with blk_mq_map_hw_queues")
> >
> > And reverting that as well as:
> >
> > 9bc1e897a821f ("blk-mq: remove unused queue mapping helpers")
> >
> > It booted fine.
>
> In the previous tests you just comment out the WARN_ON_ONCE or did you
> also replace blk_mq_clear_mq_map with blk_mq_map_queues? The
> blk_mq_clear_mq_map will map all queues to CPU 0 and if you offline CPU
> 0, there is nothing left to serve the hctx. I'll try to reproduce your
> test and see if I my idea works.

I've reproduced the second crash as well using your good old
stress-cpu-hotplug script. blk_mq_clear_mq_map the CPUs are mapped to
the first hctx and when offline a CPU blk_mq_hctx_notify_offline is not
happy about not finding any CPU mapped to the hctx.

The patch below should fix your problem. I've tested the different
setups and all looked good to me.