blk-mq actually has built-in batching(or sort of) mechanism, which is enabled
if the hw queue is busy(hctx->dispatch_busy is > 0). We use EWMA to compute
hctx->dispatch_busy, and it is adaptive, even though the implementation is quite
coarse. But there should be much space to improve, IMO.
It is reported that this way improves SQ high-end SCSI SSD very much[1],
and MMC performance gets improved too[2].
[1] https://lore.kernel.org/linux-block/3cc3e03901dc1a63ef32e036182521af@xxxxxxxxxxxxxx/
[2] https://lore.kernel.org/linux-block/CADBw62o9eTQDJ9RvNgEqSpXmg6Xcq=2TxH0Hfxhp29uF2W=TXA@xxxxxxxxxxxxxx/
The i10 I/O scheduler builds upon recent work on [6]. We have tested the i10 I/O
scheduler with nvme-tcp optimizaitons [2,3] and batching dispatch [4], varying number
of cores, varying read/write ratios, and varying request sizes, and with NVMe SSD and
RAM block device. For NVMe SSDs, the i10 I/O scheduler achieves ~60% improvements in
terms of IOPS per core over "noop" I/O scheduler. These results are available at [5],
and many additional results are presented in [6].
In case of none scheduler, basically nvme driver won't provide any queue busy
feedback, so the built-in batching dispatch doesn't work simply.
kyber scheduler uses io latency feedback to throttle and build io batch,
can you compare i10 with kyber on nvme/nvme-tcp?
While other schedulers may also batch I/O (e.g., mq-deadline), the optimization target
in the i10 I/O scheduler is throughput maximization. Hence there is no latency target
nor a need for a global tracking context, so a new scheduler is needed rather than
to build this functionality to an existing scheduler.
We currently use fixed default values as batching thresholds (e.g., 16 for #requests,
64KB for #bytes, and 50us for timeout). These default values are based on sensitivity
tests in [6]. For our future work, we plan to support adaptive batching according to
Frankly speaking, hardcode 16 #rquests or 64KB may not work everywhere,
and product environment could be much complicated than your sensitivity
tests. If possible, please start with adaptive batching.