On Tue, May 25, 2021 at 11:21:41AM +0800, Jason Wang wrote:
在 2021/5/20 下午10:13, Stefan Hajnoczi 写道:Yes, it's the design of the event suppression mechanism.
Request completion latency can be reduced by using polling instead of
irqs. Even Posted Interrupts or similar hardware support doesn't beat
polling. The reason is that disabling virtqueue notifications saves
critical-path CPU cycles on the host by skipping irq injection and in
the guest by skipping the irq handler. So let's add blk_mq_ops->poll()
support to virtio_blk.
The approach taken by this patch differs from the NVMe driver's
approach. NVMe dedicates hardware queues to polling and submits
REQ_HIPRI requests only on those queues. This patch does not require
exclusive polling queues for virtio_blk. Instead, it switches between
irqs and polling when one or more REQ_HIPRI requests are in flight on a
virtqueue.
This is possible because toggling virtqueue notifications is cheap even
while the virtqueue is running. NVMe cqs can't do this because irqs are
only enabled/disabled at queue creation time.
This toggling approach requires no configuration. There is no need to
dedicate queues ahead of time or to teach users and orchestration tools
how to set up polling queues.
Possible drawbacks of this approach:
- Hardware virtio_blk implementations may find virtqueue_disable_cb()
expensive since it requires DMA.
Note that it's probably not related to the behavior of the driver but the
design of the event suppression mechanism.
Device can choose to ignore the suppression flag and keep sending
interrupts.
If we use dedicated polling virtqueues then the hardware doesn't need to
check whether interrupts are enabled for each notification. However,
there's no mechanism to tell the device that virtqueue interrupts are
permanently disabled. This means that as of today, even dedicated
virtqueues cannot suppress interrupts without the device checking for
each notification.
Yes, that's what this patch does.If such devices become popular then
the virtio_blk driver could use a similar approach to NVMe when
VIRTIO_F_ACCESS_PLATFORM is detected in the future.
- If a blk_poll() thread is descheduled it not only hurts polling
performance but also delays completion of non-REQ_HIPRI requests on
that virtqueue since vq notifications are disabled.
Can we poll only when only high pri requests are pending?
If the backend is a remote one, I think the polling may cause more cpuRight, but polling is only done when userspace sets the RWF_HIPRI
cycles.
request flag. Most applications don't support it and for those that do
it's probably an option that the user needs to enable explicitly.
Stefan
Yes, I'll try that in the next revision.diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index fc0fb1dcd399..f0243dcd745a 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -29,6 +29,16 @@ static struct workqueue_struct *virtblk_wq;
struct virtio_blk_vq {
struct virtqueue *vq;
spinlock_t lock;
+
+ /* Number of non-REQ_HIPRI requests in flight. Protected by lock. */
+ unsigned int num_lopri;
+
+ /* Number of REQ_HIPRI requests in flight. Protected by lock. */
+ unsigned int num_hipri;
+
+ /* Are vq notifications enabled? Protected by lock. */
+ bool cb_enabled;
We had event_flag_shadow, is it sufficient to introduce a new helper
virtqueue_cb_is_enabled()?
Can you explain the scenario, I'm not sure I understand? "buffer is made+
char name[VQ_NAME_LEN];
} ____cacheline_aligned_in_smp;
@@ -171,33 +181,67 @@ static inline void virtblk_request_done(struct request *req)
blk_mq_end_request(req, virtblk_result(vbr));
}
-static void virtblk_done(struct virtqueue *vq)
+/* Returns true if one or more requests completed */
+static bool virtblk_complete_requests(struct virtqueue *vq)
{
struct virtio_blk *vblk = vq->vdev->priv;
struct virtio_blk_vq *vbq = &vblk->vqs[vq->index];
bool req_done = false;
+ bool last_hipri_done = false;
struct virtblk_req *vbr;
unsigned long flags;
unsigned int len;
spin_lock_irqsave(&vbq->lock, flags);
+
do {
- virtqueue_disable_cb(vq);
+ if (vbq->cb_enabled)
+ virtqueue_disable_cb(vq);
while ((vbr = virtqueue_get_buf(vq, &len)) != NULL) {
struct request *req = blk_mq_rq_from_pdu(vbr);
+ if (req->cmd_flags & REQ_HIPRI) {
+ if (--vbq->num_hipri == 0)
+ last_hipri_done = true;
+ } else
+ vbq->num_lopri--;
+
if (likely(!blk_should_fake_timeout(req->q)))
blk_mq_complete_request(req);
req_done = true;
}
if (unlikely(virtqueue_is_broken(vq)))
break;
- } while (!virtqueue_enable_cb(vq));
+
+ /* Enable vq notifications if non-polled requests remain */
+ if (last_hipri_done && vbq->num_lopri > 0) {
+ last_hipri_done = false;
+ vbq->cb_enabled = true;
+ }
+ } while (vbq->cb_enabled && !virtqueue_enable_cb(vq));
/* In case queue is stopped waiting for more buffers. */
if (req_done)
blk_mq_start_stopped_hw_queues(vblk->disk->queue, true);
spin_unlock_irqrestore(&vbq->lock, flags);
+
+ return req_done;
+}
+
+static int virtblk_poll(struct blk_mq_hw_ctx *hctx)
+{
+ struct virtio_blk *vblk = hctx->queue->queuedata;
+ struct virtqueue *vq = vblk->vqs[hctx->queue_num].vq;
+
+ if (!virtqueue_more_used(vq))
I'm not familiar with block polling but what happens if a buffer is made
available after virtqueue_more_used() returns false here?
available" -> are you thinking about additional requests being submitted
by the driver or an in-flight request being marked used by the device?
Stefan