Re: 4.16-RC7 WARNING: CPU: 2 PID: 0 at block/blk-mq.c:1400 __blk_mq_delay_run_hw_queue

From: Christian Borntraeger
Date: Tue Mar 27 2018 - 11:22:45 EST




On 03/27/2018 02:01 PM, Ming Lei wrote:
> Hi Stefan,
>
> On Tue, Mar 27, 2018 at 12:04:20PM +0200, Stefan Haberland wrote:
>> Hi,
>>
>> I get the following warning in __blk_mq_delay_run_hw_queue when the
>> scheduler is set to mq-deadline for DASD devices on s390.
>>
>> What I see is that for whatever reason there is a hctx nr 0 which has no
>> hctx->tags pointer set.
>> From my observation it is always hctx nr 0 which has a tags NULL pointer in
>> it and I see other hctx which have the hctx->tags pointer set correctly.
>>
>>
>> [ÂÂÂ 2.169986] WARNING: CPU: 0 PID: 0 at block/blk-mq.c:1402
>> __blk_mq_delay_run_hw_queue+0xe8/0x118
>> [ÂÂÂ 2.170007] Modules linked in:
>> [ÂÂÂ 2.170014] CPU: 0 PID: 0 Comm: swapper/0 Not tainted
>> 4.16.0-rc7-04107-g91a05d9e1d6b-dirty #147
>> [ÂÂÂ 2.170019] Hardware name: IBM 2964 N96 702 (z/VM 6.4.0)
>> [ÂÂÂ 2.170024] Krnl PSW : 0000000076fd6c7f 00000000c244c24d
>> (__blk_mq_delay_run_hw_queue+0xe8/0x118)
>> [ÂÂÂ 2.170035]ÂÂÂÂÂÂÂÂÂÂÂ R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0
>> RI:0 EA:3
>> [ÂÂÂ 2.170041] Krnl GPRS: 00000000599ec58a 0000000002a94000 0000000002a94000
>> 0000000000000001
>> [ÂÂÂ 2.170047]ÂÂÂÂÂÂÂÂÂÂÂ 0000000000000000 000000006e761e98 0000000002a96980
>> 0000000002a96800
>> [ÂÂÂ 2.170052]ÂÂÂÂÂÂÂÂÂÂÂ 0000000002d87ce0 00000000737fbda8 0000000000000000
>> 0000000000000001
>> [ÂÂÂ 2.170058]ÂÂÂÂÂÂÂÂÂÂÂ 0000000000000001 0000000000aedd10 00000000737fbc38
>> 00000000737fbc00
>> [ÂÂÂ 2.170069] Krnl Code: 00000000006ea3c8: ebaff0a00004 lmgÂÂÂÂ
>> %r10,%r15,160(%r15)
>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 00000000006ea3ce: c0f4ffff5e0d brclÂÂÂ 15,6d5fe8
>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ #00000000006ea3d4: a7f40001 brcÂÂÂÂ 15,6ea3d6
>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ >00000000006ea3d8: e340f0c00004ÂÂÂÂÂÂÂ lgÂÂÂÂÂ
>> %r4,192(%r15)
>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 00000000006ea3de: ebaff0a00004 lmgÂÂÂÂ
>> %r10,%r15,160(%r15)
>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 00000000006ea3e4: 07f4 bcrÂÂÂÂ 15,%r4
>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 00000000006ea3e6: 41b01100 laÂÂÂÂÂ %r11,256(%r1)
>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 00000000006ea3ea: 182a lrÂÂÂÂÂ %r2,%r10
>> [ÂÂÂ 2.170158] Call Trace:
>> [ÂÂÂ 2.170205] ([<0000000002a96800>] 0x2a96800)
>> [ÂÂÂ 2.170248]Â [<00000000006ea4c0>] blk_mq_run_hw_queue+0xa0/0x100
>> [ÂÂÂ 2.170262]Â [<00000000006ea59c>] blk_mq_run_hw_queues+0x7c/0x98
>> [ÂÂÂ 2.170295]Â [<00000000006e88f6>] __blk_mq_complete_request+0x10e/0x1e0
>> [ÂÂÂ 2.170300]Â [<00000000006e9e30>] blk_mq_complete_request+0x80/0xa0
>> [ÂÂÂ 2.170307]Â [<000000000087fad0>] dasd_block_tasklet+0x218/0x480
>> [ÂÂÂ 2.170415]Â [<000000000017c3f8>] tasklet_hi_action+0xa0/0x138
>> [ÂÂÂ 2.170434]Â [<0000000000a91c10>] __do_softirq+0xc8/0x540
>> [ÂÂÂ 2.170471]Â [<000000000017bd4e>] irq_exit+0x136/0x140
>> [ÂÂÂ 2.170478]Â [<000000000010c912>] do_IRQ+0x8a/0xb8
>> [ÂÂÂ 2.170518]Â [<0000000000a90ee0>] io_int_handler+0x138/0x2e0
>> [ÂÂÂ 2.170524]Â [<0000000000102cd0>] enabled_wait+0x58/0x128
>> [ÂÂÂ 2.170562] ([<0000000000102cb8>] enabled_wait+0x40/0x128)
>> [ÂÂÂ 2.170577]Â [<000000000010319a>] arch_cpu_idle+0x32/0x48
>> [ÂÂÂ 2.170604]Â [<0000000000a8f636>] default_idle_call+0x3e/0x58
>> [ÂÂÂ 2.170613]Â [<00000000001cd5d2>] do_idle+0xda/0x190
>> [ÂÂÂ 2.170621]Â [<00000000001cd93e>] cpu_startup_entry+0x3e/0x48
>> [ÂÂÂ 2.170633]Â [<0000000000e5ebf4>] start_kernel+0x47c/0x490
>> [ÂÂÂ 2.170641]Â [<0000000000100020>] _stext+0x20/0x80
>> [ÂÂÂ 2.170650] 2 locks held by swapper/0/0:
>> [ÂÂÂ 2.170658]Â #0:Â (&(&dq->lock)->rlock){..-.}, at: [<00000000b45eaf9e>]
>> dasd_block_tasklet+0x1cc/0x480
>> [ÂÂÂ 2.170676]Â #1:Â (rcu_read_lock){....}, at: [<00000000bc7fa045>]
>> hctx_lock+0x34/0x110
>> [ÂÂÂ 2.170750] Last Breaking-Event-Address:
>> [ÂÂÂ 2.170758]Â [<00000000006ea3d4>] __blk_mq_delay_run_hw_queue+0xe4/0x118
>> [ÂÂÂ 2.170803] ---[ end trace 1073cf0de1fd32d0 ]---
>
> This warning is harmless, please try the following patch:
>
> --
> From 7b2b5139bfef80f44d1b1424e09ab35b715fbfdb Mon Sep 17 00:00:00 2001
> From: Ming Lei <ming.lei@xxxxxxxxxx>
> Date: Tue, 27 Mar 2018 19:54:23 +0800
> Subject: [PATCH] blk-mq: only run mapped hw queues in blk_mq_run_hw_queues()
>
> From commit 20e4d813931961fe ("blk-mq: simplify queue mapping & schedule
> with each possisble CPU") on, it should be easier to see unmapped hctx
> in some CPU topo, such as, hctx may not be mapped to any CPU.
>
> This patch avoids the warning in __blk_mq_delay_run_hw_queue() by
> checking if the hctx is mapped in blk_mq_run_hw_queues().
>
> Reported-by: Stefan Haberland <sth@xxxxxxxxxxxxxxxxxx>
> Cc: Christoph Hellwig <hch@xxxxxx>
> Fixes: 20e4d813931961fe ("blk-mq: simplify queue mapping & schedule with each possisble CPU")
> Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx>

Can we get this into 4.16?

> ---
> block/blk-mq.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 16e83e6df404..48f25a63833b 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1459,7 +1459,7 @@ void blk_mq_run_hw_queues(struct request_queue *q, bool async)
> int i;
>
> queue_for_each_hw_ctx(q, hctx, i) {
> - if (blk_mq_hctx_stopped(hctx))
> + if (blk_mq_hctx_stopped(hctx) || !blk_mq_hw_queue_mapped(hctx))
> continue;
>
> blk_mq_run_hw_queue(hctx, async);
>