Re: [External] Re: [PATCH] vduse: Fix msg list race in vduse_dev_read_iter

From: Eugenio Perez Martin

Date: Mon Feb 02 2026 - 04:30:11 EST


On Sat, Jan 31, 2026 at 8:15 AM Zhang Tianci
<zhangtianci.1997@xxxxxxxxxxxxx> wrote:
>
> Hi Eugenio,
>
> On Fri, Jan 30, 2026 at 7:52 PM Eugenio Perez Martin
> <eperezma@xxxxxxxxxx> wrote:
> >
> > On Fri, Jan 30, 2026 at 9:15 AM Zhang Tianci
> > <zhangtianci.1997@xxxxxxxxxxxxx> wrote:
> > >
> > > Move the message to recv_list before dropping msg_lock and copying the
> > > request to userspace, avoiding a transient unlinked state that can race
> > > with the msg_sync timeout path. Roll back to send_list on copy failures.
> > >
> >
> > Missed Fixes: tag and Cc: stable@xxxxxxxxxxxxxxx. Or maybe we can
> > consider this a change in the behavior? I don't think any VDUSE
> > instance should trust it will never receive that message that it
> > received partially once but still...
>
> I'll add the Fixes tag and CC stable list in v2 patch.
>
> >
> >
> > > Signed-off-by: Zhang Tianci <zhangtianci.1997@xxxxxxxxxxxxx>
> > > Reviewed-by: Xie Yongji <xieyongji@xxxxxxxxxxxxx>
> > > ---
> > > drivers/vdpa/vdpa_user/vduse_dev.c | 30 ++++++++++++++++++++++--------
> > > 1 file changed, 22 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
> > > index ae357d014564c..b6a558341c06c 100644
> > > --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> > > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> > > @@ -325,6 +325,7 @@ static ssize_t vduse_dev_read_iter(struct kiocb *iocb, struct iov_iter *to)
> > > struct file *file = iocb->ki_filp;
> > > struct vduse_dev *dev = file->private_data;
> > > struct vduse_dev_msg *msg;
> > > + struct vduse_dev_request req;
> > > int size = sizeof(struct vduse_dev_request);
> > > ssize_t ret;
> > >
> > > @@ -339,7 +340,7 @@ static ssize_t vduse_dev_read_iter(struct kiocb *iocb, struct iov_iter *to)
> > >
> > > ret = -EAGAIN;
> > > if (file->f_flags & O_NONBLOCK)
> > > - goto unlock;
> > > + break;
> > >
> > > spin_unlock(&dev->msg_lock);
> > > ret = wait_event_interruptible_exclusive(dev->waitq,
> > > @@ -349,17 +350,30 @@ static ssize_t vduse_dev_read_iter(struct kiocb *iocb, struct iov_iter *to)
> > >
> > > spin_lock(&dev->msg_lock);
> > > }
> > > + if (!msg) {
> > > + spin_unlock(&dev->msg_lock);
> > > + return ret;
> > > + }
> > > +
> > > + memcpy(&req, &msg->req, sizeof(req));
> > > + /*
> > > + * Move @msg to recv_list before dropping msg_lock.
> > > + * This avoids a window where @msg is detached from any list and
> > > + * vduse_dev_msg_sync() timeout path may operate on an unlinked node.
> >
> > But in the timeout case, msg->completed is false so list_del is never
> > called, isn't it?
>
> Did you misread it?

Yes, I probably misread the code :). I should not post on Friday evenings!

> Here's the code:
> spin_lock(&dev->msg_lock);
> if (!msg->completed) { <- here msg->completed is false.
> list_del(&msg->list); <- boom
> msg->resp.result = VDUSE_REQ_RESULT_FAILED;
> /* Mark the device as malfunction when there is a timeout */
> if (!ret)
> vduse_dev_broken(dev);
> }
> ret = (msg->resp.result == VDUSE_REQ_RESULT_OK) ? 0 : -EIO;
> spin_unlock(&dev->msg_lock);
>
> >
> > Is there even any event that may cause more than one packet in either
> > queue? Maybe we can simplify a lot of this if we don't have that
> > assumption.
>
> That makes sense. However, I believe that even though such a scenario doesn't
> exist at present, we cannot depend on the queue containing just a single entry.
> This is a very fragile assumption and can easily be invalidated by modifications
> to upper-layer operations.
>

I didn't mean to change the external character device, that would be a
userland visible change. Just to simplify the part of the code
internal to the module.

For example, instead of having a list, have a permanent msg with an
enum state: free (as "no message in flight)", sent_to_userland,
completed. So we don't need to worry about lists etc.

If vduse ever implements more than one message in flight it is very
easy to come back to the list. Note that I'm also interested in
speeding up the vdpa initialization and to have many msgs in flight is
one of the most straightforward ways to do it, I'm not against that in
the long term :).

> >
> > > + */
> > > + vduse_enqueue_msg(&dev->recv_list, msg);

If we re-enqueue it, another read from another userland thread could
read the same message again concurrently with this vduse_dev_read_iter
call, isn't it?

> > > spin_unlock(&dev->msg_lock);
> > > - ret = copy_to_iter(&msg->req, size, to);
> > > - spin_lock(&dev->msg_lock);
> > > +
> > > + ret = copy_to_iter(&req, size, to);
> > > if (ret != size) {
> > > + spin_lock(&dev->msg_lock);
> > > + /* Roll back: move msg back to send_list if still pending. */
> > > + msg = vduse_find_msg(&dev->recv_list, req.request_id);
> > > + if (msg)
> > > + vduse_enqueue_msg(&dev->send_list, msg)
> > > + spin_unlock(&dev->msg_lock);
> > > ret = -EFAULT;
> > > - vduse_enqueue_msg(&dev->send_list, msg);
> > > - goto unlock;
> > > }
> > > - vduse_enqueue_msg(&dev->recv_list, msg);
> > > -unlock:
> > > - spin_unlock(&dev->msg_lock);
> > >
> > > return ret;
> > > }
> > > --
> > > 2.39.5
> > >
> >
>
> I'll send out the v2 patch soon.
>
> Thanks,
> Tianci
>