Re: [PATCH V2] virtio-net: fix the race between refill work and close

From: Jakub Kicinski
Date: Wed Jun 29 2022 - 22:51:34 EST


On Thu, 30 Jun 2022 10:08:04 +0800 Jason Wang wrote:
> +static void enable_refill_work(struct virtnet_info *vi)
> +{
> + spin_lock(&vi->refill_lock);
> + vi->refill_work_enabled = true;
> + spin_unlock(&vi->refill_lock);
> +}
> +
> +static void disable_refill_work(struct virtnet_info *vi)
> +{
> + spin_lock(&vi->refill_lock);
> + vi->refill_work_enabled = false;
> + spin_unlock(&vi->refill_lock);
> +}
> +
> static void virtqueue_napi_schedule(struct napi_struct *napi,
> struct virtqueue *vq)
> {
> @@ -1527,8 +1547,12 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> }
>
> if (rq->vq->num_free > min((unsigned int)budget, virtqueue_get_vring_size(rq->vq)) / 2) {
> - if (!try_fill_recv(vi, rq, GFP_ATOMIC))
> - schedule_delayed_work(&vi->refill, 0);
> + if (!try_fill_recv(vi, rq, GFP_ATOMIC)) {
> + spin_lock(&vi->refill_lock);
> + if (vi->refill_work_enabled)
> + schedule_delayed_work(&vi->refill, 0);
> + spin_unlock(&vi->refill_lock);

Are you sure you can use the basic spin_lock() flavor in all cases?
Isn't the disable/enable called from a different context than this
thing here?

The entire delayed work construct seems a little risky because the work
may go to sleep after disabling napi, causing large latency spikes.
I guess you must have a good reason no to simply reschedule the NAPI
and keep retrying with GFP_ATOMIC...

Please add the target tree name to the subject.