Re: [PATCH RFC] vhost: fix vhost_get_avail_idx for a non empty ring

From: ShuangYu

Date: Sun Mar 22 2026 - 05:43:46 EST

> From: "Michael S. Tsirkin"<mst@xxxxxxxxxx>
> Date: Mon, Mar 2, 2026, 16:52
> Subject: [PATCH RFC] vhost: fix vhost_get_avail_idx for a non empty ring
> To: <linux-kernel@xxxxxxxxxxxxxxx>
> Cc: "ShuangYu"<shuangyu@xxxxxxxxx>, "Stefano Garzarella"<sgarzare@xxxxxxxxxx>, "Stefan Hajnoczi"<stefanha@xxxxxxxxxx>, "Jason Wang"<jasowang@xxxxxxxxxx>, "Eugenio Pérez"<eperezma@xxxxxxxxxx>, <kvm@xxxxxxxxxxxxxxx>, <virtualization@xxxxxxxxxxxxxxx>, <netdev@xxxxxxxxxxxxxxx>
> vhost_get_avail_idx is supposed to report whether it has updated
> vq->avail_idx. Instead, it returns whether all entries have been
> consumed, which is usually the same. But not always - in
> drivers/vhost/net.c and when mergeable buffers have been enabled, the
> driver checks whether the combined entries are big enough to store an
> incoming packet. If not, the driver re-enables notifications with
> available entries still in the ring. The incorrect return value from
> vhost_get_avail_idx propagates through vhost_enable_notify and causes
> the host to livelock if the guest is not making progress, as vhost will
> immediately disable notifications and retry using the available entries.
>
> The obvious fix is to make vhost_get_avail_idx do what the comment
> says it does and report whether new entries have been added.
>
> Reported-by: ShuangYu <shuangyu@xxxxxxxxx>
> Fixes: d3bb267bbdcb ("vhost: cache avail index in vhost_enable_notify()")
> Cc: Stefano Garzarella <sgarzare@xxxxxxxxxx>
> Cc: Stefan Hajnoczi <stefanha@xxxxxxxxxx>
> Signed-off-by: Michael S. Tsirkin <mst@xxxxxxxxxx>
> ---
>
> Lightly tested, posting early to simplify testing for the reporter.
>
> drivers/vhost/vhost.c | 11 +++++++----
> 1 file changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 2f2c45d20883..db329a6f6145 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -1522,6 +1522,7 @@ static void vhost_dev_unlock_vqs(struct vhost_dev *d)
> static inline int vhost_get_avail_idx(struct vhost_virtqueue *vq)
> {
> __virtio16 idx;
> + u16 avail_idx;
> int r;
>
> r = vhost_get_avail(vq, idx, &vq->avail->idx);
> @@ -1532,17 +1533,19 @@ static inline int vhost_get_avail_idx(struct vhost_virtqueue *vq)
> }
>
> /* Check it isn't doing very strange thing with available indexes */
> - vq->avail_idx = vhost16_to_cpu(vq, idx);
> - if (unlikely((u16)(vq->avail_idx - vq->last_avail_idx) > vq->num)) {
> + avail_idx = vhost16_to_cpu(vq, idx);
> + if (unlikely((u16)(avail_idx - vq->last_avail_idx) > vq->num)) {
> vq_err(vq, "Invalid available index change from %u to %u",
> - vq->last_avail_idx, vq->avail_idx);
> + vq->last_avail_idx, avail_idx);
> return -EINVAL;
> }
>
> /* We're done if there is nothing new */
> - if (vq->avail_idx == vq->last_avail_idx)
> + if (avail_idx == vq->avail_idx)
> return 0;
>
> + vq->avail_idx = avail_idx;
> +
> /*
> * We updated vq->avail_idx so we need a memory barrier between
> * the index read above and the caller reading avail ring entries.
> --
> MST
>

Hi,

Sorry for the delay, it took me some time to build a reliable test environment.

I've verified the patch on 6.18.10. With 16 concurrent TCP flows over
virtio-net (TSO enabled, vCPU throttled to 1%, 300s duration):

Before patch (bpftrace + CPU monitor):
    - vhost_discard_vq_desc: up to 3,716,272/3s
    - vhost_enable_notify false positives: up to 3,716,278/3s
      (nearly 1:1 with discard — every partial alloc triggers re-loop)
    - vhost worker CPU: 0~99%, frequently 50-99%
    - successful receives: as low as 137/3s

After patch:
    - vhost_discard_vq_desc: 9~33/3s
    - vhost_enable_notify false positives: 0 (all 100 sample windows)
    - vhost worker CPU: 0% (all 149 sample points)
    - successful receives: 873~2,667/3s

The partial allocations still occur under load, but after the fix
vhost_get_avail_idx correctly returns 0, so the worker sleeps
instead of spinning. Thanks.

Tested-by: ShuangYu <shuangyu@xxxxxxxxx>

Best regards,
ShuangYu