Re: [PATCH v3 0/5] nvdimm: virtio_pmem: fix request lifetime and converge broken queue failures
From: Li Chen
Date: Tue Jun 09 2026 - 08:15:33 EST
Hi Alison,
---- On Tue, 02 Jun 2026 09:51:26 +0800 Alison Schofield <alison.schofield@xxxxxxxxx> wrote ---
> On Thu, Feb 26, 2026 at 10:57:05AM +0800, Li Chen wrote:
> > Hi,
> >
> > The virtio-pmem flush path uses a virtqueue cookie/token to carry a
> > per-request context through completion. Under broken virtqueue / notify
> > failure conditions, the submitter can return and free the request object
> > while the host/backend may still complete the published request. The IRQ
> > completion handler then dereferences freed memory when waking waiters,
> > which is reported by KASAN as a slab-use-after-free and may manifest as
> > lock corruption (e.g. "BUG: spinlock already unlocked") without KASAN.
> >
> > In addition, the flush path has two wait sites: one for virtqueue
> > descriptor availability (-ENOSPC from virtqueue_add_sgs()) and one for
> > request completion. If the virtqueue becomes broken, forward progress is
> > no longer guaranteed and these waiters may sleep indefinitely unless the
> > driver converges the failure and wakes all wait sites.
> >
> > This series addresses both issues:
> >
> > 1/5 nvdimm: virtio_pmem: always wake -ENOSPC waiters
> > Wake one -ENOSPC waiter for each reclaimed used buffer, decoupled from
> > token completion.
> >
> > 2/5 nvdimm: virtio_pmem: use READ_ONCE()/WRITE_ONCE() for wait flags
> > Use READ_ONCE()/WRITE_ONCE() for the wait_event() flags (done and
> > wq_buf_avail).
> >
> > 3/5 nvdimm: virtio_pmem: refcount requests for token lifetime
> > Refcount request objects so the token lifetime spans the window where it
> > is reachable through the virtqueue until completion/drain drops the
> > virtqueue reference.
> >
> > 4/5 nvdimm: virtio_pmem: converge broken virtqueue to -EIO
> > Track a device-level broken state to converge broken/notify failures to
> > -EIO: wake all waiters and drain/detach outstanding requests to complete
> > them with an error, and fail-fast new requests.
> >
> > 5/5 nvdimm: virtio_pmem: drain requests in freeze
> > Drain outstanding requests in freeze() before tearing down virtqueues so
> > waiters do not sleep indefinitely.
> >
> > Testing was done on QEMU x86_64 with a virtio-pmem device exported as
> > /dev/pmem0, formatted with ext4 (-O fast_commit), mounted with DAX, and
> > stressed with fsync-heavy workloads.
> >
> > Thanks,
> > Li Chen
>
> Hi Li Chen,
>
> Today I took a look at this set, noting that it's been sitting idle
> in our nvdimm backlog for a while. I'm not able to apply it. Can you
> post a new rev that applies to 7.1-rc6 ?
>
> Thanks,
> Alison
Sorry for my late reply. I have just sent v4(https://lore.kernel.org/all/20260609120726.1714780-1-me@linux.beauty/)
which can be applied to 7.1-rc7. Thanks for your comment.
Regards,
Li