Re: [PATCH V4 2/2] ublk_drv: add UBLK_IO_REFETCH_REQ for supporting to build as module
From: Ming Lei
Date: Mon Jul 11 2022 - 22:47:18 EST
On Tue, Jul 12, 2022 at 10:26:47AM +0800, Ziyang Zhang wrote:
> On 2022/7/12 04:06, Gabriel Krisman Bertazi wrote:
> > Ming Lei <ming.lei@xxxxxxxxxx> writes:
> >> Add UBLK_IO_REFETCH_REQ command to fetch the incoming io request in
> >> ubq daemon context, so we can avoid to call task_work_add(), then
> >> it is fine to build ublk driver as module.
> >> In this way, iops is affected a bit, but just by ~5% on ublk/null,
> >> given io_uring provides pretty good batching issuing & completing.
> >> One thing to be careful is race between ->queue_rq() and handling
> >> abort, which is avoided by quiescing queue when aborting queue.
> >> Except for that, handling abort becomes much easier with
> >> UBLK_IO_REFETCH_REQ since aborting handler is strictly exclusive with
> >> anything done in ubq daemon kernel context.
> > Hi Ming,
> > FWIW, I'm not very fond this change. It adds complexity to the kernel
> > driver and to the userspace server implementation, who now have to deal
> > with different interface semantics just because the driver was built-in
> > or built as a module. I don't think the tristate support warrants such
> > complexity. I was hoping we might get away with exporting that symbol
> > or adding a built-in ubd-specific wrapper that can be exported and
> > invokes task_work_add.
> > Either way, Alibaba seems to consider this feature useful, and if that
> > is the case, we can just not use it on our side.
> Our app handles IOs itself with network(RPC) and internal memory pool
> so UBLK_IO_REFETCH_REQ
> (actually I think it is like NEED_GET_DATA in the earlist version :) )
> is helpful to us because we can assign data buffer address AFTER the app
> gets one IO requests(WRITE, with data size) and we avoid PRE-allocating buffers.
Maybe you can consider to switch to pre-allocation.
The patch for pinning io vm pages in the io lifetime has been done, just
not included in this patchset, and it passes all the builtin tests, but
there is still space for further optimization.
With that patchset in, io pages becomes pinned during whole io handling time,
after io is done, mm can reclaim these pages without needing to swapout. It
works like madvise(MADV_DONTNEED).