Perf top data indicates lock contention in "blk_mq_find_and_get_req" call.
1.31% 1.31% kworker/57:1H-k [kernel.vmlinux]
native_queued_spin_lock_slowpath
ret_from_fork
kthread
worker_thread
process_one_work
blk_mq_timeout_work
blk_mq_queue_tag_busy_iter
bt_iter
blk_mq_find_and_get_req
_raw_spin_lock_irqsave
native_queued_spin_lock_slowpath
Kernel v5.14 Data -
%Node1 : 8.4 us, 31.2 sy, 0.0 ni, 43.7 id, 0.0 wa, 0.0 hi, 16.8 si, 0.0
st
4.46% [kernel] [k] complete_cmd_fusion
3.69% [kernel] [k] megasas_build_and_issue_cmd_fusion
2.97% [kernel] [k] blk_mq_find_and_get_req
2.81% [kernel] [k] megasas_build_ldio_fusion
2.62% [kernel] [k] syscall_return_via_sysret
2.17% [kernel] [k] __entry_text_start
2.01% [kernel] [k] io_submit_one
1.87% [kernel] [k] scsi_queue_rq
1.77% [kernel] [k] native_queued_spin_lock_slowpath
1.76% [kernel] [k] scsi_complete
1.66% [kernel] [k] llist_reverse_order
1.63% [kernel] [k] _raw_spin_lock_irqsave
1.61% [kernel] [k] llist_add_batch
1.39% [kernel] [k] aio_complete_rw
1.37% [kernel] [k] read_tsc
1.07% [kernel] [k] blk_complete_reqs
1.07% [kernel] [k] native_irq_return_iret
1.04% [kernel] [k] __x86_indirect_thunk_rax
1.03% fio [.] __fio_gettime
1.00% [kernel] [k] flush_smp_call_function_queue
Test #2: Three VDs (each VD consist of 8 SAS SSDs).
(numactl -N 1 fio
3vd.fio --rw=randread --bs=4k --iodepth=32 --numjobs=8
--ioscheduler=none/mq-deadline)
There is a performance regression but it is not due to this patch set.
Kernel v5.11 gives 2.1M IOPs on mq-deadline but 5.15 (without this patchset)
gives 1.8M IOPs.
In this test I did not noticed CPU issue as mentioned in Test-1.
In general, I noticed host_busy is incorrect once I apply this patchset. It
should not be more than can_queue, but sysfs host_busy value is very high
when IOs are running. This issue is only after applying this patchset.
Is this patch set only change the behavior of <shared_host_tag> enabled
driver ? Will there be any impact on mpi3mr driver ? I can test that as
well.