Re: [PATCH V4 2/2] ublk_drv: add UBLK_IO_REFETCH_REQ for supporting to build as module

From: Ming Lei
Date: Mon Jul 11 2022 - 22:33:57 EST


Hi Gabriel,

On Mon, Jul 11, 2022 at 04:06:04PM -0400, Gabriel Krisman Bertazi wrote:
> Ming Lei <ming.lei@xxxxxxxxxx> writes:
>
> > Add UBLK_IO_REFETCH_REQ command to fetch the incoming io request in
> > ubq daemon context, so we can avoid to call task_work_add(), then
> > it is fine to build ublk driver as module.
> >
> > In this way, iops is affected a bit, but just by ~5% on ublk/null,
> > given io_uring provides pretty good batching issuing & completing.
> >
> > One thing to be careful is race between ->queue_rq() and handling
> > abort, which is avoided by quiescing queue when aborting queue.
> > Except for that, handling abort becomes much easier with
> > UBLK_IO_REFETCH_REQ since aborting handler is strictly exclusive with
> > anything done in ubq daemon kernel context.
>
> Hi Ming,
>
> FWIW, I'm not very fond this change. It adds complexity to the kernel
> driver and to the userspace server implementation, who now have to deal

IMO, this way just adds dozens line of code, no much complexity. The only
complexity in ublk driver should be in aborting code, which is actually
originated from concurrent aborting work and running task work which may be
run after task is exiting. But any storage driver's aborting/error
handling code is complicated.

Using REFETCH_REQ actually becomes much easier for handling abort which is
run exclusively with any code running in ubq daemon context, but with
performance cost.

> with different interface semantics just because the driver was built-in
> or built as a module. I don't think the tristate support warrants such
> complexity. I was hoping we might get away with exporting that symbol
> or adding a built-in ubd-specific wrapper that can be exported and
> invokes task_work_add.

If task_work_add can be exported, that would be very great.

Cc more guys who contributed to task_work_add().

Oleg, Jens and other guys, I am wondering if you may share your opinion
wrt. exporting task_work_add for ublk's use case? Then ublk_drv can be
built as module without this patch.

>
> Either way, Alibaba seems to consider this feature useful, and if that
> is the case, we can just not use it on our side.

If this new added command is proved as useful, we may add it without
binding with module if task_work_add can be exported.

>
> That said, the patch looks good to me, just a minor comment inline.
>
> Thanks,
>
> > Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx>
> > ---
> > drivers/block/Kconfig | 2 +-
> > drivers/block/ublk_drv.c | 121 ++++++++++++++++++++++++++--------
> > include/uapi/linux/ublk_cmd.h | 17 +++++
> > 3 files changed, 113 insertions(+), 27 deletions(-)
> >
> > diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig
> > index d218089cdbec..2ba77fd960c2 100644
> > --- a/drivers/block/Kconfig
> > +++ b/drivers/block/Kconfig
> > @@ -409,7 +409,7 @@ config BLK_DEV_RBD
> > If unsure, say N.
> >
> > config BLK_DEV_UBLK
> > - bool "Userspace block driver"
> > + tristate "Userspace block driver"
> > select IO_URING
> > help
> > io uring based userspace block driver.
> > diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> > index 0076418e6fad..98482f8d1a77 100644
> > --- a/drivers/block/ublk_drv.c
> > +++ b/drivers/block/ublk_drv.c
> > @@ -92,6 +92,7 @@ struct ublk_queue {
> > int q_id;
> > int q_depth;
> >
> > + unsigned long flags;
> > struct task_struct *ubq_daemon;
> > char *io_cmd_buf;
> >
> > @@ -141,6 +142,15 @@ struct ublk_device {
> > struct work_struct stop_work;
> > };
> >
> > +#define ublk_use_task_work(ubq) \
> > +({ \
> > + bool ret = false; \
> > + if (IS_BUILTIN(CONFIG_BLK_DEV_UBLK) && \
> > + !((ubq)->flags & UBLK_F_NEED_REFETCH)) \
> > + ret = true; \
> > + ret; \
> > +})
> > +
>
> This should be an inline function, IMO.

You are right. I worried that compiler may not be intelligent
enough for handling the code correctly. But just tried inline,
task_work_add() won't be linked in if CONFIG_BLK_DEV_UBLK is
defined as m.


Thanks,
Ming