Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)

From: Stefan Haberland
Date: Thu Jan 11 2018 - 08:17:21 EST


On 11.01.2018 12:44, Christian Borntraeger wrote:

On 01/11/2018 10:13 AM, Ming Lei wrote:
On Wed, Dec 20, 2017 at 04:47:21PM +0100, Christian Borntraeger wrote:
On 12/18/2017 02:56 PM, Stefan Haberland wrote:
On 07.12.2017 00:29, Christoph Hellwig wrote:
On Wed, Dec 06, 2017 at 01:25:11PM +0100, Christian Borntraeger wrote:
t > commit 11b2025c3326f7096ceb588c3117c7883850c068ÂÂÂ -> bad
ÂÂÂÂ blk-mq: create a blk_mq_ctx for each possible CPU
does not boot on DASD and
commit 9c6ae239e01ae9a9f8657f05c55c4372e9fc8bccÂÂÂ -> good
ÂÂÂ genirq/affinity: assign vectors to all possible CPUs
does boot with DASD disks.

Also adding Stefan Haberland if he has an idea why this fails on DASD and adding Martin (for the
s390 irq handling code).
That is interesting as it really isn't related to interrupts at all,
it just ensures that possible CPUs are set in ->cpumask.

I guess we'd really want:

e005655c389e3d25bf3e43f71611ec12f3012de0
"blk-mq: only select online CPUs in blk_mq_hctx_next_cpu"

before this commit, but it seems like the whole stack didn't work for
your either.

I wonder if there is some weird thing about nr_cpu_ids in s390?
--
To unsubscribe from this list: send the line "unsubscribe linux-s390" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html

I tried this on my system and the blk-mq-hotplug-fix branch does not boot for me as well.
The disks get up and running and I/O works fine. At least the partition detection and EXT4-fs mount works.

But at some point in time the disk do not get any requests.

I currently have no clue why.
I took a dump and had a look at the disk states and they are fine. No error in the logs or in our debug entrys. Just empty DASD devices waiting to be called for I/O requests.

Do you have anything I could have a look at?
Jens, Christoph, so what do we do about this?
To summarize:
- commit 4b855ad37194f7 ("blk-mq: Create hctx for each present CPU") broke CPU hotplug.
- Jens' quick revert did fix the issue and did not broke DASD support but has some issues
with interrupt affinity.
- Christoph patch set fixes the hotplug issue for virtio blk but causes I/O hangs on DASDs (even
without hotplug).
Hello,

This one is a valid use case for VM, I think we need to fix that.

Looks there is issue on the fouth patch("blk-mq: only select online
CPUs in blk_mq_hctx_next_cpu"), I fixed it in the following tree, and
the other 3 patches are same with Christoph's:

https://github.com/ming1/linux.git v4.15-rc-block-for-next-cpuhot-fix

gitweb:
https://github.com/ming1/linux/commits/v4.15-rc-block-for-next-cpuhot-fix

Could you test it and provide the feedback?

BTW, if it can't help this issue, could you boot from a normal disk first
and dump blk-mq debugfs of DASD later?
That kernel seems to boot fine on my system with DASD disks.

--
To unsubscribe from this list: send the line "unsubscribe linux-s390" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html


I did some regression testing and it works quite well. Boot works, attaching CPUs during runtime on z/VM and enabling them in Linux works as well.
I also did some DASD online/offline CPU enable/disable loops.

Regards,
Stefan