RE: [PATCH v5 00/14] blk-mq: Reduce static requests memory footprint for shared sbitmap

From: Kashyap Desai
Date: Thu Oct 07 2021 - 16:32:14 EST


> > -----Original Message-----
> > From: John Garry [mailto:john.garry@xxxxxxxxxx]
> > Sent: Tuesday, October 5, 2021 7:05 PM
> > To: Jens Axboe <axboe@xxxxxxxxx>; kashyap.desai@xxxxxxxxxxxx
> > Cc: linux-block@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> > ming.lei@xxxxxxxxxx; hare@xxxxxxx; linux-scsi@xxxxxxxxxxxxxxx
> > Subject: Re: [PATCH v5 00/14] blk-mq: Reduce static requests memory
> > footprint for shared sbitmap
> >
> > On 05/10/2021 13:35, Jens Axboe wrote:
> > >> Baseline is 1b2d1439fc25 (block/for-next) Merge branch 'for-
> 5.16/io_uring'
> > >> into for-next
> > > Let's get this queued up for testing, thanks John.
> >
> > Cheers, appreciated
> >
> > @Kashyap, You mentioned that when testing you saw a performance
> > regression from v5.11 -> v5.12 - any idea on that yet? Can you
> > describe the scenario, like IO scheduler and how many disks and the
> > type? Does disabling host_tagset_enable restore performance?
>
> John - I am still working on this. System was not available due to some
> other
> debugging.

John -

I tested this patchset on 5.15-rc4 (master) -
https://github.com/torvalds/linux.git

#1 I noticed some performance regression @mq-deadline scheduler which is not
related to this series. I will bisect and get more detail about this issue
separately.
#2 w.r.t this patchset, I noticed one issue which is related to cpu usage
is high in certain case.

I have covered test on same setup using same h/w. I tested on Aero MegaRaid
Controller.

Test #1 : Total 24 SAS SSDs in JBOD mode.
(numactl -N 1 fio
24.fio --rw=randread --bs=4k --iodepth=256 --numjobs=1
--ioscheduler=none/mq-deadline)
No performance regression is noticed using this patchset. I can get 3.1 M
IOPs (max IOPs on this setup). I noticed some CPU hogging issue if iodepth
from application is high.

Cpu usage data from (top)
%Node1 : 6.4 us, 57.5 sy, 0.0 ni, 23.7 id, 0.0 wa, 0.0 hi, 12.4 si, 0.0
st

Perf top data -
19.11% [kernel] [k] native_queued_spin_lock_slowpath
4.72% [megaraid_sas] [k] complete_cmd_fusion
3.70% [megaraid_sas] [k] megasas_build_and_issue_cmd_fusion
2.76% [megaraid_sas] [k] megasas_build_ldio_fusion
2.16% [kernel] [k] syscall_return_via_sysret
2.16% [kernel] [k] entry_SYSCALL_64
1.87% [megaraid_sas] [k] megasas_queue_command
1.58% [kernel] [k] io_submit_one
1.53% [kernel] [k] llist_add_batch
1.51% [kernel] [k] blk_mq_find_and_get_req
1.43% [kernel] [k] llist_reverse_order
1.42% [kernel] [k] scsi_complete
1.18% [kernel] [k] blk_mq_rq_ctx_init.isra.51
1.17% [kernel] [k] _raw_spin_lock_irqsave
1.15% [kernel] [k] blk_mq_get_driver_tag
1.09% [kernel] [k] read_tsc
0.97% [kernel] [k] native_irq_return_iret
0.91% [kernel] [k] scsi_queue_rq
0.89% [kernel] [k] blk_complete_reqs

Perf top data indicates lock contention in "blk_mq_find_and_get_req" call.

1.31% 1.31% kworker/57:1H-k [kernel.vmlinux]
native_queued_spin_lock_slowpath
ret_from_fork
kthread
worker_thread
process_one_work
blk_mq_timeout_work
blk_mq_queue_tag_busy_iter
bt_iter
blk_mq_find_and_get_req
_raw_spin_lock_irqsave
native_queued_spin_lock_slowpath


Kernel v5.14 Data -

%Node1 : 8.4 us, 31.2 sy, 0.0 ni, 43.7 id, 0.0 wa, 0.0 hi, 16.8 si, 0.0
st
4.46% [kernel] [k] complete_cmd_fusion
3.69% [kernel] [k] megasas_build_and_issue_cmd_fusion
2.97% [kernel] [k] blk_mq_find_and_get_req
2.81% [kernel] [k] megasas_build_ldio_fusion
2.62% [kernel] [k] syscall_return_via_sysret
2.17% [kernel] [k] __entry_text_start
2.01% [kernel] [k] io_submit_one
1.87% [kernel] [k] scsi_queue_rq
1.77% [kernel] [k] native_queued_spin_lock_slowpath
1.76% [kernel] [k] scsi_complete
1.66% [kernel] [k] llist_reverse_order
1.63% [kernel] [k] _raw_spin_lock_irqsave
1.61% [kernel] [k] llist_add_batch
1.39% [kernel] [k] aio_complete_rw
1.37% [kernel] [k] read_tsc
1.07% [kernel] [k] blk_complete_reqs
1.07% [kernel] [k] native_irq_return_iret
1.04% [kernel] [k] __x86_indirect_thunk_rax
1.03% fio [.] __fio_gettime
1.00% [kernel] [k] flush_smp_call_function_queue


Test #2: Three VDs (each VD consist of 8 SAS SSDs).
(numactl -N 1 fio
3vd.fio --rw=randread --bs=4k --iodepth=32 --numjobs=8
--ioscheduler=none/mq-deadline)

There is a performance regression but it is not due to this patch set.
Kernel v5.11 gives 2.1M IOPs on mq-deadline but 5.15 (without this patchset)
gives 1.8M IOPs.
In this test I did not noticed CPU issue as mentioned in Test-1.

In general, I noticed host_busy is incorrect once I apply this patchset. It
should not be more than can_queue, but sysfs host_busy value is very high
when IOs are running. This issue is only after applying this patchset.

Is this patch set only change the behavior of <shared_host_tag> enabled
driver ? Will there be any impact on mpi3mr driver ? I can test that as
well.

Kashyap

>
> >
> > From checking differences between those kernels, I don't see anything
> > directly relevant in sbitmap support or in the megaraid sas driver.
> >
> > Thanks,
> > John

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature