Re: [PATCH 2/2] fuse: wait for aborted connection before releasing last fuse_dev
From: Bernd Schubert
Date: Mon May 18 2026 - 05:58:05 EST
On 5/18/26 03:13, Berkant Koc wrote:
> [You don't often get email from me@xxxxxxxxxx. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>
> Bernd, thanks for pushing back. Stepping through this against the trace:
>
> fuse_conn_destroy() in fs/fuse/inode.c calls fuse_wait_aborted()
> between fuse_abort_conn() and the eventual fuse_conn_put() (from
> fuse_sb_destroy). fuse_dev_release() in fs/fuse/dev.c does not wait
> between its fuse_abort_conn() and fuse_conn_put(). That asymmetry is
> the race.
>
> On topologies where the last fud release IS the last conn ref
> (no superblock mount, no other fud open — exactly the PoC setup),
> fuse_conn_put() drops the count to zero, call_rcu schedules
> delayed_release, and fuse_uring_destruct kfrees ring/queue/ent_released
> slabs. async_teardown_work, scheduled by fuse_uring_async_stop_queues
> via the teardown-interval delayed_work, then runs on freed memory.
>
> The KASAN trace at top-finding/kasan-trace.txt shows exactly that
> interleaving:
>
> free site: fuse_uring_destruct ← delayed_release ← rcu_core
> use site: fuse_uring_teardown_all_queues ← async_teardown_work
> (workqueue), reading ent->list.next from
> kmalloc-192 freed by destruct
>
> Your in-flight cmd ref invariant holds on both fixed and non-fixed
> paths (non-fixed via per-cmd io_put_file in io_free_batch_list, fixed
> via the io_uring file table slot pinning struct file → fud → fuse_conn).
> But neither covers the gap between fuse_abort_conn (which schedules
> the async work and returns immediately) and the RCU callback. The
> PoC topology removes every other ref-holder, so that gap becomes the
> last conn ref.
>
> The patch restores symmetry with fuse_conn_destroy by waiting on
> ring->queue_refs == 0 (via fuse_wait_aborted → fuse_uring_wait_stopped_queues)
> before the put. That guarantees async_teardown_work has finished
> before RCU is armed.
>
> The race is reproducible with mdelay-widening; without widening I see
> 0 trips in 50 iter, but the window is in the code paths.
I think I see what the actual issue is, we need an fc (or in linux-next
struct fuse_chan) reference as long as fuse_uring_async_stop_queues()
runs. Patch follows.
Thanks,
Bernd