Re: [PATCH net v3 1/3] virtio-net: don't schedule delayed refill worker
From: Bui Quang Minh
Date: Sat Jan 10 2026 - 03:23:43 EST
On 1/10/26 09:12, Jakub Kicinski wrote:
On Tue, 6 Jan 2026 22:04:36 +0700 Bui Quang Minh wrote:
When we fail to refill the receive buffers, we schedule a delayed workerHappy to see this go FWIW. If it causes issues we should consider
to retry later. However, this worker creates some concurrency issues.
For example, when the worker runs concurrently with virtnet_xdp_set,
both need to temporarily disable queue's NAPI before enabling again.
Without proper synchronization, a deadlock can happen when
napi_disable() is called on an already disabled NAPI. That
napi_disable() call will be stuck and so will the subsequent
napi_enable() call.
To simplify the logic and avoid further problems, we will instead retry
refilling in the next NAPI poll.
adding some retry logic in the core (NAPI) rather than locally in
the driver..
Fixes: 4bc12818b363 ("virtio-net: disable delayed refill when pausing rx")The Closes should probably point to Paolo's report. We'll wipe these CI
Reported-by: Paolo Abeni <pabeni@xxxxxxxxxx>
Closes: https://netdev-ctrl.bots.linux.dev/logs/vmksft/drv-hw-dbg/results/400961/3-xdp-py/stderr
logs sooner or later but the lore archive will stick around.
I'll fix it in the next version.
@@ -3230,9 +3230,10 @@ static int virtnet_open(struct net_device *dev)We should enforce _some_ minimal fill level at the time of open().
for (i = 0; i < vi->max_queue_pairs; i++) {
if (i < vi->curr_queue_pairs)
- /* Make sure we have some buffers: if oom use wq. */
- if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL))
- schedule_delayed_work(&vi->refill, 0);
+ /* Pre-fill rq agressively, to make sure we are ready to
+ * get packets immediately.
+ */
+ try_fill_recv(vi, &vi->rq[i], GFP_KERNEL);
If the ring is completely empty no traffic will ever flow, right?
Perhaps I missed scheduling the NAPI somewhere..
The NAPI is enabled and scheduled in virtnet_napi_enable(). The code path is like this
virtnet_enable_queue_pair
-> virtnet_napi_enable
-> virtnet_napi_do_enable
-> virtqueue_napi_schedule
The same happens in __virtnet_rx_resume().
err = virtnet_enable_queue_pair(vi, i);Similar thing here? Tho not sure we can fail here..
if (err < 0)
@@ -3472,16 +3473,15 @@ static void __virtnet_rx_resume(struct virtnet_info *vi,
struct receive_queue *rq,
bool refill)
{
- bool running = netif_running(vi->dev);
- bool schedule_refill = false;
+ if (netif_running(vi->dev)) {
+ /* Pre-fill rq agressively, to make sure we are ready to get
+ * packets immediately.
+ */
+ if (refill)
+ try_fill_recv(vi, rq, GFP_KERNEL);
- if (refill && !try_fill_recv(vi, rq, GFP_KERNEL))nit: spurious new line
- schedule_refill = true;
- if (running)
virtnet_napi_enable(rq);
-
- if (schedule_refill)
- schedule_delayed_work(&vi->refill, 0);
+ }
}
static void virtnet_rx_resume_all(struct virtnet_info *vi)
@@ -3829,11 +3829,13 @@ static int virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs)
}
succ:
vi->curr_queue_pairs = queue_pairs;
- /* virtnet_open() will refill when device is going to up. */
- spin_lock_bh(&vi->refill_lock);
- if (dev->flags & IFF_UP && vi->refill_enabled)
- schedule_delayed_work(&vi->refill, 0);
- spin_unlock_bh(&vi->refill_lock);
+ if (dev->flags & IFF_UP) {
+ local_bh_disable();
+ for (int i = 0; i < vi->curr_queue_pairs; ++i)
+ virtqueue_napi_schedule(&vi->rq[i].napi, vi->rq[i].vq);
+
I'll delete it in the next version.
+ local_bh_enable();
+ }
return 0;
}