Re: [PATCH 3/3] virtio_blk: implement blk_mq_ops->poll()

From: Stefan Hajnoczi
Date: Thu Jun 03 2021 - 11:24:29 EST

Next message: Ulf Hansson: "Re: [PATCH v2 0/4] PM: domains: Avoid boilerplate code for DVFS in subsystem/drivers"
Previous message: Yang Yingliang: "[PATCH -next] scsi: mpi3mr: Fix missing unlock on error"
In reply to: Stefan Hajnoczi: "Re: [PATCH 3/3] virtio_blk: implement blk_mq_ops->poll()"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Thu, May 27, 2021 at 01:48:36PM +0800, Jason Wang wrote:
>
> 在 2021/5/25 下午4:59, Stefan Hajnoczi 写道:
> > On Tue, May 25, 2021 at 11:21:41AM +0800, Jason Wang wrote:
> > > 在 2021/5/20 下午10:13, Stefan Hajnoczi 写道:
> > > > Request completion latency can be reduced by using polling instead of
> > > > irqs. Even Posted Interrupts or similar hardware support doesn't beat
> > > > polling. The reason is that disabling virtqueue notifications saves
> > > > critical-path CPU cycles on the host by skipping irq injection and in
> > > > the guest by skipping the irq handler. So let's add blk_mq_ops->poll()
> > > > support to virtio_blk.
> > > >
> > > > The approach taken by this patch differs from the NVMe driver's
> > > > approach. NVMe dedicates hardware queues to polling and submits
> > > > REQ_HIPRI requests only on those queues. This patch does not require
> > > > exclusive polling queues for virtio_blk. Instead, it switches between
> > > > irqs and polling when one or more REQ_HIPRI requests are in flight on a
> > > > virtqueue.
> > > >
> > > > This is possible because toggling virtqueue notifications is cheap even
> > > > while the virtqueue is running. NVMe cqs can't do this because irqs are
> > > > only enabled/disabled at queue creation time.
> > > >
> > > > This toggling approach requires no configuration. There is no need to
> > > > dedicate queues ahead of time or to teach users and orchestration tools
> > > > how to set up polling queues.
> > > >
> > > > Possible drawbacks of this approach:
> > > >
> > > > - Hardware virtio_blk implementations may find virtqueue_disable_cb()
> > > > expensive since it requires DMA.
> > >
> > > Note that it's probably not related to the behavior of the driver but the
> > > design of the event suppression mechanism.
> > >
> > > Device can choose to ignore the suppression flag and keep sending
> > > interrupts.
> > Yes, it's the design of the event suppression mechanism.
> >
> > If we use dedicated polling virtqueues then the hardware doesn't need to
> > check whether interrupts are enabled for each notification. However,
> > there's no mechanism to tell the device that virtqueue interrupts are
> > permanently disabled. This means that as of today, even dedicated
> > virtqueues cannot suppress interrupts without the device checking for
> > each notification.
>
>
> This can be detected via a transport specific way.
>
> E.g in the case of MSI, VIRTIO_MSI_NO_VECTOR could be a hint.

Nice idea :). Then there would be no need for changes to the hardware
interface. IRQ-less virtqueues is could still be mentioned explicitly in
the VIRTIO spec so that driver/device authors are aware of the
VIRTIO_MSI_NO_VECTOR trick.

> > > > +static int virtblk_poll(struct blk_mq_hw_ctx *hctx)
> > > > +{
> > > > + struct virtio_blk *vblk = hctx->queue->queuedata;
> > > > + struct virtqueue *vq = vblk->vqs[hctx->queue_num].vq;
> > > > +
> > > > + if (!virtqueue_more_used(vq))
> > >
> > > I'm not familiar with block polling but what happens if a buffer is made
> > > available after virtqueue_more_used() returns false here?
> > Can you explain the scenario, I'm not sure I understand? "buffer is made
> > available" -> are you thinking about additional requests being submitted
> > by the driver or an in-flight request being marked used by the device?
>
>
> Something like that:
>
> 1) requests are submitted
> 2) poll but virtqueue_more_used() return false
> 3) device make buffer used
>
> In this case, will poll() be triggered again by somebody else? (I think
> interrupt is disabled here).

Yes. An example blk_poll() user is
fs/block_dev.c:__blkdev_direct_IO_simple():

qc = submit_bio(&bio);
for (;;) {
set_current_state(TASK_UNINTERRUPTIBLE);
if (!READ_ONCE(bio.bi_private))
break;
if (!(iocb->ki_flags & IOCB_HIPRI) ||
!blk_poll(bdev_get_queue(bdev), qc, true))
blk_io_schedule();
}

That's the infinite loop. The block layer implements the generic portion
of blk_poll(). blk_poll() calls mq_ops->poll() (virtblk_poll()).

So in general the polling loop will keep iterating, but there are
exceptions:
1. need_resched() causes blk_poll() to return 0 and blk_io_schedule()
will be called.
2. blk-mq has a fancier io_poll algorithm that estimates I/O time and
sleeps until the expected completion time to save CPU cycles. I
haven't looked into detail at this one.

Both these cases affect existing mq_ops->poll() implementations (e.g.
NVMe). What's new in this patch series is that virtio-blk could have
non-polling requests on the virtqueue which now has irqs disabled. So we
could wait for them.

I think there's an easy solution for this: don't disable virtqueue irqs
when there are non-REQ_HIPRI requests in flight. The disadvantage is
that we'll keep irqs disable in more situations so the performance
improvement may not apply in some configurations.

Stefan

Attachment: signature.asc
Description: PGP signature

Next message: Ulf Hansson: "Re: [PATCH v2 0/4] PM: domains: Avoid boilerplate code for DVFS in subsystem/drivers"
Previous message: Yang Yingliang: "[PATCH -next] scsi: mpi3mr: Fix missing unlock on error"
In reply to: Stefan Hajnoczi: "Re: [PATCH 3/3] virtio_blk: implement blk_mq_ops->poll()"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]