[PATCH 0/2] fuse: fix and optimize parallel writes on passthrough mounts

From: Russ Fellows

Date: Thu May 28 2026 - 23:19:29 EST


These two patches fix a bug that causes FOPEN_PARALLEL_DIRECT_WRITES to have
no effect on FUSE passthrough mounts, preventing write IOPS from scaling with
concurrency.

The bug has two sides that must be fixed together (patch 1/2):

(a) fuse_passthrough_write_iter() calls inode_lock() directly, bypassing
the fuse_dio_lock() function that checks FOPEN_PARALLEL_DIRECT_WRITES.

(b) fuse_file_io_open() strips FOPEN_PARALLEL_DIRECT_WRITES from any open
that lacks FOPEN_DIRECT_IO, including passthrough opens where the flag
is redundant and should not be required.

Either bug alone is sufficient to serialize all writers. Together they ensure
the flag can never take effect on a passthrough-backed file.

Patch 2/2 is an independent performance follow-on: once parallel writes are
unblocked, the per-inode spinlock (fi->lock) becomes the next measurable cost.
This patch converts iocachectr to atomic_t and adds lockless fast paths to
fuse_inode_uncached_io_start/end and fuse_write_update_attr.

Tested with fio randwrite 4K direct, numjobs=1/2/4/8, iodepth=64, on a FUSE
passthrough mount backed by XFS on a RAM-backed null_blk device (kernel 6.17.13,
AMD EPYC 9B45, 16 threads):

numjobs | Before (IOPS) | After patch 1 | After patch 2
--------|---------------|---------------|---------------
1 | ~470K | ~470K | ~470K
2 | ~650K | ~930K | ~930K
4 | ~640K | ~1,530K | ~1,530K
8 | ~635K | ~1,707K | ~1,707K

Raw XFS throughput on the same device at numjobs=8: ~1,702K IOPS.

Russ Fellows (2):
fuse: fix FOPEN_PARALLEL_DIRECT_WRITES being ignored for passthrough writes
fuse: reduce fi->lock contention on parallel direct I/O

fs/fuse/file.c | 49 ++++++++++++++++++++++-----
fs/fuse/fuse_i.h | 11 +++++-
fs/fuse/inode.c | 2 +-
fs/fuse/iomode.c | 61 +++++++++++++++++++++++++++------
fs/fuse/passthrough.c | 6 ++--
5 files changed, 104 insertions(+), 25 deletions(-)

--
2.51.0