On Mon, Oct 21, 2024 at 02:30:01PM +0300, Sagi Grimberg wrote:
OK, it looks not necessary to AND with cpu_online_mask in
On 21/10/2024 11:31, Ming Lei wrote:
On Mon, Oct 21, 2024 at 10:05:34AM +0300, Sagi Grimberg wrote:None of the consumers of this API use managed-irqs. the networking stack
I am afraid that just one tag from the specified hw queue isn't enough.
On 21/10/2024 4:39, Ming Lei wrote:
On Sun, Oct 20, 2024 at 10:40:41PM +0800, zhuxiaohui wrote:For what nvmf is using blk_mq_alloc_request_hctx() is not important. It just
From: Zhu Xiaohui <zhuxiaohui.400@xxxxxxxxxxxxx>blk_mq_alloc_request_hctx()
It is observed that nvme connect to a nvme over fabric target will
always fail when 'nohz_full' is set.
In commit a46c27026da1 ("blk-mq: don't schedule block kworker on
isolated CPUs"), it clears hctx->cpumask for all isolate CPUs,
and when nvme connect to a remote target, it may fails on this stack:
blk_mq_alloc_request_hctx+1
__nvme_submit_sync_cmd+106
nvmf_connect_io_queue+181
nvme_tcp_start_queue+293
nvme_tcp_setup_ctrl+948
nvme_tcp_create_ctrl+735
nvmf_dev_write+532
vfs_write+237
ksys_write+107
do_syscall_64+128
entry_SYSCALL_64_after_hwframe+118
due to that the given blk_mq_hw_ctx->cpumask is cleared with no available
blk_mq_ctx on the hw queue.
This patch introduce a new blk_mq_req_flags_t flag 'BLK_MQ_REQ_ARB_MQ'
as well as a nvme_submit_flags_t 'NVME_SUBMIT_ARB_MQ' which are used to
indicate that block layer can fallback to a blk_mq_ctx whose cpu
is not isolated.
...
cpu = cpumask_first_and(data.hctx->cpumask, cpu_online_mask);
...
It can happen in case of non-cpu-isolation too, such as when this hctx hasn't
online CPUs, both are same actually from this viewpoint.
It is one long-time problem for nvme fc.
needs a tag from that hctx. the request execution is running where
blk_mq_alloc_request_hctx() is running.
The connection request needs to be issued to the hw queue & completed.
Without any online CPU for this hw queue, the request can't be completed
in case of managed-irq.
takes care of steering irq vectors to online cpus.
blk_mq_alloc_request_hctx, and the behavior is actually from commit
20e4d8139319 ("blk-mq: simplify queue mapping & schedule with each possisble CPU").
But it is still too tricky as one API, please look at blk_mq_get_tag(), which may
allocate tag from other hw queue, instead of the specified one.
It is just lucky for connection request because IO isn't started
yet at that time, and the allocation always succeeds in the 1st try of
__blk_mq_get_tag().