Re: [PATCH V10 00/10] famfs: port into fuse
From: Joanne Koong
Date: Mon Apr 06 2026 - 13:44:06 EST
On Tue, Mar 31, 2026 at 5:37 AM John Groves <john@xxxxxxxxxxxxxx> wrote:
>
> From: John Groves <john@xxxxxxxxxx>
>
> NOTE: this series depends on the famfs dax series in Ira's for-7.1/dax-famfs
> branch [0]
>
> Changes v9 -> v10
> - Rebased to Ira's for-7.1/dax-famfs branch [0], which contains the required
> dax patches
> - Add parentheses to FUSE_IS_VIRTIO_DAX() macro, in case something bad is
> passed in as fuse_inode (thanks Jonathan's AI)
>
> Description:
>
> This patch series introduces famfs into the fuse file system framework.
> Famfs depends on the bundled dax patch set.
>
> The famfs user space code can be found at [1].
>
> Fuse Overview:
>
> Famfs started as a standalone file system, but this series is intended to
> permanently supersede that implementation. At a high level, famfs adds
> two new fuse server messages:
>
> GET_FMAP - Retrieves a famfs fmap (the file-to-dax map for a famfs
> file)
> GET_DAXDEV - Retrieves the details of a particular daxdev that was
> referenced by an fmap
>
> Famfs Overview
>
> Famfs exposes shared memory as a file system. Famfs consumes shared
> memory from dax devices, and provides memory-mappable files that map
> directly to the memory - no page cache involvement. Famfs differs from
> conventional file systems in fs-dax mode, in that it handles in-memory
> metadata in a sharable way (which begins with never caching dirty shared
> metadata).
>
> Famfs started as a standalone file system [2,3], but the consensus at
> LSFMM was that it should be ported into fuse [4,5].
>
> The key performance requirement is that famfs must resolve mapping faults
> without upcalls. This is achieved by fully caching the file-to-devdax
> metadata for all active files. This is done via two fuse client/server
> message/response pairs: GET_FMAP and GET_DAXDEV.
>
> Famfs remains the first fs-dax file system that is backed by devdax
> rather than pmem in fs-dax mode (hence the need for the new dax mode).
>
> Notes
>
> - When a file is opened in a famfs mount, the OPEN is followed by a
> GET_FMAP message and response. The "fmap" is the full file-to-dax
> mapping, allowing the fuse/famfs kernel code to handle
> read/write/fault without any upcalls.
>
> - After each GET_FMAP, the fmap is checked for extents that reference
> previously-unknown daxdevs. Each such occurrence is handled with a
> GET_DAXDEV message and response.
>
> - Daxdevs are stored in a table (which might become an xarray at some
> point). When entries are added to the table, we acquire exclusive
> access to the daxdev via the fs_dax_get() call (modeled after how
> fs-dax handles this with pmem devices). Famfs provides
> holder_operations to devdax, providing a notification path in the
> event of memory errors or forced reconfiguration.
>
> - If devdax notifies famfs of memory errors on a dax device, famfs
> currently blocks all subsequent accesses to data on that device. The
> recovery is to re-initialize the memory and file system. Famfs is
> memory, not storage...
>
> - Because famfs uses backing (devdax) devices, only privileged mounts are
> supported (i.e. the fuse server requires CAP_SYS_RAWIO).
>
> - The famfs kernel code never accesses the memory directly - it only
> facilitates read, write and mmap on behalf of user processes, using
> fmap metadata provided by its privileged fuse server. As such, the
> RAS of the shared memory affects applications, but not the kernel.
>
> - Famfs has backing device(s), but they are devdax (char) rather than
> block. Right now there is no way to tell the vfs layer that famfs has a
> char backing device (unless we say it's block, but it's not). Currently
> we use the standard anonymous fuse fs_type - but I'm not sure that's
> ultimately optimal (thoughts?)
>
> Changes v8 -> v9
> - Kconfig: fs/fuse/Kconfig:CONFIG_FUSE_FAMFS_DAX now depends on the
> new CONFIG_DEV_DAX_FSDEV (from drivers/dax/Kconfig) rather than
> just CONFIG_DEV_DAX and CONFIG_FS_DAX. (CONFIG_FUSE_FAMFS_DAX
> depends on those...)
>
> Changes v7 -> v8
> - Moved to inline __free declaration in fuse_get_fmap() and
> famfs_fuse_meta_alloc(), famfs_teardown()
> - Adopted FIELD_PREP() macro rather than manual bitfield manipulation
> - Minor doc edits
> - I dropped adding magic numbers to include/uapi/linux/magic.h. That
> can be done later if appropriate
>
> Changes v6 -> v7
> - Fixed a regression in famfs_interleave_fileofs_to_daxofs() that
> was reported by Intel's kernel test robot
> - Added a check in __fsdev_dax_direct_access() for negative return
> from pgoff_to_phys(), which would indicate an out-of-range offset
> - Fixed a bug in __famfs_meta_free(), where not all interleaved
> extents were freed
> - Added chunksize alignment checks in famfs_fuse_meta_alloc() and
> famfs_interleave_fileofs_to_daxofs() as interleaved chunks must
> be PTE or PMD aligned
> - Simplified famfs_file_init_dax() a bit
> - Re-ran CM's kernel code review prompts on the entire series and
> fixed several minor issues
>
> Changes v4 -> v5 -> v6
> - None. Re-sending due to technical difficulties
>
> Changes v3 [9] -> v4
> - The patch "dax: prevent driver unbind while filesystem holds device"
> has been dropped. Dan Williams indicated that the favored behavior is
> for a file system to stop working if an underlying driver is unbound,
> rather than preventing the unbind.
> - The patch "famfs_fuse: Famfs mount opt: -o shadow=<shadowpath>" has
> been dropped. Found a way for the famfs user space to do without the
> -o opt (via getxattr).
> - Squashed the fs/fuse/Kconfig patch into the first subsequent patch
> that needed the change
> ("famfs_fuse: Basic fuse kernel ABI enablement for famfs")
> - Many review comments addressed.
> - Addressed minor kerneldoc infractions reported by test robot.
>
> Changes v2 [7] -> v3
> - Dax: Completely new fsdev driver (drivers/dax/fsdev.c) replaces the
> dev_dax_iomap modifications to bus.c/device.c. Devdax devices can now
> be switched among 'devdax', 'famfs' and 'system-ram' modes via daxctl
> or sysfs.
> - Dax: fsdev uses MEMORY_DEVICE_FS_DAX type and leaves folios at order-0
> (no vmemmap_shift), allowing fs-dax to manage folio lifecycles
> dynamically like pmem does.
> - Dax: The "poisoned page" problem is properly fixed via
> fsdev_clear_folio_state(), which clears stale mapping/compound state
> when fsdev binds. The temporary WARN_ON_ONCE workaround in fs/dax.c
> has been removed.
> - Dax: Added dax_set_ops() so fsdev can set dax_operations at bind time
> (and clear them on unbind), since the dax_device is created before we
> know which driver will bind.
> - Dax: Added custom bind/unbind sysfs handlers; unbind return -EBUSY if a
> filesystem holds the device, preventing unbind while famfs is mounted.
> - Fuse: Famfs mounts now require that the fuse server/daemon has
> CAP_SYS_RAWIO because they expose raw memory devices.
> - Fuse: Added DAX address_space_operations with noop_dirty_folio since
> famfs is memory-backed with no writeback required.
> - Rebased to latest kernels, fully compatible with Alistair Popple
> et. al's recent dax refactoring.
> - Ran this series through Chris Mason's code review AI prompts to check
> for issues - several subtle problems found and fixed.
> - Dropped RFC status - this version is intended to be mergeable.
>
> Changes v1 [8] -> v2:
>
> - The GET_FMAP message/response has been moved from LOOKUP to OPEN, as
> was the pretty much unanimous consensus.
> - Made the response payload to GET_FMAP variable sized (patch 12)
> - Dodgy kerneldoc comments cleaned up or removed.
> - Fixed memory leak of fc->shadow in patch 11 (thanks Joanne)
> - Dropped many pr_debug and pr_notice calls
>
>
> References
>
> [0] - https://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm.git/
> [1] - https://famfs.org (famfs user space)
> [2] - https://lore.kernel.org/linux-cxl/cover.1708709155.git.john@xxxxxxxxxx/
> [3] - https://lore.kernel.org/linux-cxl/cover.1714409084.git.john@xxxxxxxxxx/
> [4] - https://lwn.net/Articles/983105/ (lsfmm 2024)
> [5] - https://lwn.net/Articles/1020170/ (lsfmm 2025)
> [6] - https://lore.kernel.org/linux-cxl/cover.8068ad144a7eea4a813670301f4d2a86a8e68ec4.1740713401.git-series.apopple@xxxxxxxxxx/
> [7] - https://lore.kernel.org/linux-fsdevel/20250703185032.46568-1-john@xxxxxxxxxx/ (famfs fuse v2)
> [8] - https://lore.kernel.org/linux-fsdevel/20250421013346.32530-1-john@xxxxxxxxxx/ (famfs fuse v1)
> [9] - https://lore.kernel.org/linux-fsdevel/20260107153244.64703-1-john@xxxxxxxxxx/T/#mb2c868801be16eca82dab239a1d201628534aea7 (famfs fuse v3)
>
>
> John Groves (10):
> famfs_fuse: Update macro s/FUSE_IS_DAX/FUSE_IS_VIRTIO_DAX/
> famfs_fuse: Basic fuse kernel ABI enablement for famfs
> famfs_fuse: Plumb the GET_FMAP message/response
> famfs_fuse: Create files with famfs fmaps
> famfs_fuse: GET_DAXDEV message and daxdev_table
> famfs_fuse: Plumb dax iomap and fuse read/write/mmap
> famfs_fuse: Add holder_operations for dax notify_failure()
> famfs_fuse: Add DAX address_space_operations with noop_dirty_folio
> famfs_fuse: Add famfs fmap metadata documentation
> famfs_fuse: Add documentation
>
> Documentation/filesystems/famfs.rst | 142 ++++
> Documentation/filesystems/index.rst | 1 +
> MAINTAINERS | 10 +
> fs/fuse/Kconfig | 13 +
> fs/fuse/Makefile | 1 +
> fs/fuse/dir.c | 2 +-
> fs/fuse/famfs.c | 1180 +++++++++++++++++++++++++++
> fs/fuse/famfs_kfmap.h | 167 ++++
> fs/fuse/file.c | 45 +-
> fs/fuse/fuse_i.h | 116 ++-
> fs/fuse/inode.c | 35 +-
> fs/fuse/iomode.c | 2 +-
> fs/namei.c | 1 +
> include/uapi/linux/fuse.h | 88 ++
> 14 files changed, 1790 insertions(+), 13 deletions(-)
> create mode 100644 Documentation/filesystems/famfs.rst
> create mode 100644 fs/fuse/famfs.c
> create mode 100644 fs/fuse/famfs_kfmap.h
>
>
> base-commit: 2ae624d5a555d47a735fb3f4d850402859a4db77
> --
> 2.53.0
>
Hi John,
I’m curious to hear your thoughts on whether you think it makes sense
for the famfs-specific logic in this series to be moved to a bpf
program that goes through a generic fuse iomap dax layer.
Based on [1], this gives feature-parity with the famfs logic in this
series. In my opinion, having famfs go through a generic fuse iomap
dax layer makes the fuse kernel code more extensible for future
servers that will also want to use dax iomap, and keeps the fuse code
cleaner by not having famfs-specific logic hardcoded in and having to
introduce new fuse uapis for something famfs-specific. In my
understanding of it, fuse is meant to be generic and it feels like
adding server-specific logic goes against that design philosophy and
sets a precedent for other servers wanting similar special-casing in
the future. I'd like to explore whether the bpf and generic fuse iomap
dax layer approach can preserve that philosophy while still giving
famfs the flexibility it needs.
I think moving the famfs logic to bpf benefits famfs as well:
- Instead of needing to issue a FUSE_GET_FMAP request after a file is
opened, the server can directly populate the metadata map from
userspace with the mapping info when it processes the FUSE_OPEN
request, which gets rid of the roundtrip cost
- The server can dynamically update the metadata / bpf maps during
runtime from userspace if any mapping info needs to change
- Future code changes / updates for famfs are all server-side and can
be deployed immediately instead of needing to go through the upstream
kernel mailing list process
- Famfs updates / new releases can ship independently of kernel releases
I'd appreciate the chance to discuss tradeoffs or if you'd rather
discuss this at the fuse BoF at lsf, that sounds great too.
Thanks,
Joanne
[1] https://lore.kernel.org/linux-fsdevel/CAJnrk1YMqDKA5gDZasrxGjJtfdbhmjxX5uhUv=OSPyA=G5EE+Q@xxxxxxxxxxxxxx/
>